Example: Medicare Processing Pipeline (with PostgreSQL)

Prerequisites 

Before starting, make sure you have:

PostgreSQL running and a database.ini ready — follow the steps in Using Dorieh with PostgreSQL Backend. If you want a quick non-production setup, use the bundled Docker Compose file:
```
cd $WORKDIR/dorieh/docker/pg-hll/
docker compose up -d
```
Toil installed and tested — see Installing Dorieh and testing the installation.

The dorieh repository cloned under $WORKDIR:

git clone https://github.com/ForomePlatform/dorieh.git $WORKDIR/dorieh

Then move into the Medicare example directory:

cd $WORKDIR/dorieh/examples/with-postgres/medicare

Step 1 — Download synthetic sample data 

The synthetic Medicare-like dataset is hosted on Zenodo and is publicly available — no data use agreement or institutional access is required. It:

Mimics the raw fixed-width files (.dat) delivered by ResDAC
Reproduces the File Transfer Summary (FTS) layout metadata
Contains no real PHI / PII — it is safe for testing, demonstrations, and sharing

Download and unpack it:

mkdir -p data
pushd data

curl -fLo medicare-synthetic-database.zip \
  'https://zenodo.org/records/18915558/files/medicare-synthetic-database-v1.zip?download=1'

unzip medicare-synthetic-database.zip

popd

The extracted directory tree follows the layout expected by the ingestion pipeline — one sub-directory per year, each containing .fts metadata files and the corresponding .dat data files.

Step 2 — Run the Medicare processing pipeline 

Activate your Toil virtual environment:

source $TOIL_VENV/bin/activate

Run the Medicare CWL workflow:

toil-cwl-runner \
  --jobStore j1 \
  --retryCount 1 \
  --cleanWorkDir never \
  --outdir outputs \
  --workDir . \
  https://raw.githubusercontent.com/ForomePlatform/dorieh/refs/heads/main/src/workflows/medicare.cwl \
  --input data/ \
  --database https://raw.githubusercontent.com/ForomePlatform/dorieh/refs/heads/main/examples/with-postgres/database.ini \
  --connection_name dorieh

After the workflow completes:

Processed data is stored in your PostgreSQL backend under the medicare schema (beneficiaries, enrollments, admissions, and QC tables).
Any additional output files are written to the outputs/ directory.

For a detailed description of the pipeline steps, see Medicare: Building a Data Warehouse from ResDac Files.

Step 3 (Optional) — Explore results in Apache Superset 

A pre-built Medicare Quality Control Superset dashboard is included with the example. The steps below show how to launch PostgreSQL together with Superset using Docker Compose and import the dashboard.

3.1 — Stop the simple PostgreSQL container (if running)

If you started the basic docker-compose.yml stack earlier, bring it down first:

cd $WORKDIR/dorieh/docker/pg-hll/
docker compose down

3.2 — Start PostgreSQL + Superset 

From the same directory, start the extended stack:

cd $WORKDIR/dorieh/docker/pg-hll/
docker compose -f docker-compose-superset.yml up --build
# Or, to run detached:
# docker compose -f docker-compose-superset.yml up -d

This brings up:

A PostgreSQL instance (with the HLL extension)
The Superset web UI
Superset background worker services

3.3 — Log in to Superset 

Once the services are running, open http://localhost:8088/ in your browser.

Default credentials (defined in docker-compose-superset.yml):

Field	Value
Username	`admin`
Password	`admin`

If initialization did not complete and you cannot log in, create the admin user manually:

cd $WORKDIR/dorieh/docker/pg-hll/
docker compose exec superset superset fab create-admin \
  --username admin \
  --firstname Admin \
  --lastname User \
  --email admin@example.com \
  --password admin

3.4 — Add the Dorieh database connection 

In the Superset UI:

Open Settings → Database Connections (under Data).
Click + DATABASE and choose PostgreSQL.
At the bottom of the dialog click “Connect this database with a SQL Alchemy URL string instead”.
Fill in:
- Display Name: DORIEH
- SQLAlchemy URI: postgresql+psycopg2://dorieh:dorieh_secret@postgres:5432/dorieh
Click TEST CONNECTION — if it succeeds, click CONNECT.

3.5 — Import the Medicare QC dashboard 

Back in your terminal, run the helper script:

cd $WORKDIR/dorieh/docker/pg-hll/
./import_superset_dashboard.sh

The script copies the dashboard definition into the Superset container and imports it. Internally it runs:

docker compose cp \
  ../../examples/with-postgres/medicare/superset/MedicareQCDashboard.json \
  superset:/tmp/MedicareQCDashboard.json
docker compose exec superset superset import_dashboards \
  --path /tmp/MedicareQCDashboard.json --username admin
docker compose -f docker-compose-superset.yml restart superset superset-worker

3.6 — Explore the dashboard 

In the Superset UI:

Click the Dashboards tab.
Open “Medicare QC (Clean)”.
Use the available charts and filters to examine data quality across enrollments and admissions for the synthetic dataset.

The dashboard surfaces the QC metrics described in Creating QC Tables, including:

Percent of beneficiaries with consistent date-of-birth / date-of-death records
Percent of valid admission records (passed primary-key, foreign-key, and duplicate checks)
Approximate distinct-beneficiary counts computed with HLL sketches

What the pipeline produces 

After a successful run the following tables and views are available in the medicare schema of your PostgreSQL database:

Object	Type	Description
`medicare.ps`	View	Federated patient summary across all raw years
`medicare.beneficiaries`	Table	One row per unique beneficiary (deduplicated)
`medicare.enrollments`	Table	Yearly enrollment records per beneficiary/state
`medicare.admissions`	Table	Validated inpatient admission records
`medicare_audit.admissions`	Table	Records that failed validation (with reason)
`medicare.qc_enrollments`	Materialized View	Aggregate QC dimensions for enrollments
`medicare.qc_admissions`	Materialized View	Aggregate QC dimensions for admissions