Dorieh Tutorials
Contents
- Building a data preparation Workflow
- Introduction
- Prerequisites
- Design overview
- Directory layout
- Step 1. Create a minimal CWL workflow skeleton
- Step 2. Iteratively Defining Steps and Parameters
- Step 3. Parameterize for a single day (“toy” run)
- Step 4. Add database integration (PostgreSQL)
- Step 5. Building Medallion Layers (Bronze, Silver, Gold)
- Step 6. Testing the Pipeline
- Next Steps
- Documenting a Workflow
- Constructing data dictionaries and lineage graphs
Note
A full tutorial on the Medicare data warehouse pipeline — covering the FTS → YAML → SQL ingestion pattern, federated views across years, validation and journaling, and HLL-based quality control — is covered in Chapter 8 of the book Research Data that Can be Trusted (Bouzinier et al., forthcoming). Each chapter will be assigned its own DOI. In the meantime, see the operational step-by-step guide in Example: Medicare Processing Pipeline, which uses the publicly available synthetic dataset on Zenodo so no data use agreement is required to follow along.