Dorieh Data Platform

Contents

  • Introduction
  • What is Data Platform
  • Data Domains
  • Data Processing Pipelines
  • Python Packages
  • Data Modelling for Dorieh Data Platform
  • Examples
  • Dorieh Tutorials
    • Building a data preparation Workflow
    • Documenting a Workflow
    • Constructing data dictionaries and lineage graphs
  • Data Platform Internals
  • Database Testing Framework
  • Adding more data
  • Executing containerized apps
  • Terms and Acronyms
  • Indices
Dorieh Data Platform
  • Dorieh Tutorials
  • View page source

Dorieh Tutorials

Contents

  • Building a data preparation Workflow
    • Introduction
    • Prerequisites
    • Design overview
      • Inputs
      • Outputs
      • Architecture
    • Directory layout
    • Step 1. Create a minimal CWL workflow skeleton
    • Step 2. Iteratively Defining Steps and Parameters
    • Step 3. Parameterize for a single day (“toy” run)
    • Step 4. Add database integration (PostgreSQL)
      • Start or check PostgreSQL
      • Add PostgreSQL integration to the workflow
      • Add the Database initialization step
      • Defining Data Model
      • Adding Ingestion Step
    • Step 5. Building Medallion Layers (Bronze, Silver, Gold)
    • Step 6. Testing the Pipeline
    • Next Steps
  • Documenting a Workflow
    • Workflow files
    • Generating skeleton documentation
    • Enhancing workflow documentation
      • Adding Workflow Title
      • Documenting workflow elements with doc key
  • Constructing data dictionaries and lineage graphs
    • What we can generate
    • Output Formats and Modes
    • Running the tool
    • Exploring the Artifacts
    • Enriching the Data Dictionary
    • Lineage Diagram for Medicare data
Previous Next

© Copyright 2021-2024, Harvard University.

Built with Sphinx using a theme provided by Read the Docs.