Full EPA AirNow Processing Pipeline (including downloading shapefiles)

Source code

Workflow

Description

This workflow downloads AirNow data from the government servers, introspects it to infer the database schema and ingests the data into the database

Example run:

cwl-runner airnow.cwl sample_airnow.yml

See sample_airnow.yml

Or

cwl-runner --parallel /opt/airflow/project/epa/src/cwl/airnow.cwl --database /opt/airflow/project/database.ini --connection_name nsaph2 --proxy $HTTP_PROXY  --api-key XXXXXXXX-YYYY-ZZZZ-XXXX-YYYYYYYYY --from 2022-01-01 --to 2022-08-31 --parameter_code pm25 --table airnow_pm25_2022

Inputs

Name

Type

Default

Description

proxy

string?

HTTP/HTTPS Proxy if required

api-key

string

API key for AirNow

database

File

Path to database connection file, usually database.ini

connection_name

string

The name of the section in the database.ini file

from

string

Start date for downolading, in YYYY-MM-DD format

to

string

End date for downolading, in YYYY-MM-DD format

parameter_code

string

Parameter code. Either a numeric code (e.g. 88101, 44201) or symbolic name (e.g. PM25, NO2). See more: AQS Code List

table

string

Name of the table to be created in the database

year

int

Outputs

Name

Type

Description

shapes_data

File[]

download_log

File

ingest_log

File

index_log

File

vacuum_log

File

download_data

File

model

File

Steps

Name

Runs

Description

get_shapes

get_shapes.cwl

This step downloads Shape files from a given collection (TIGER/Line or GENZ) and a geography (ZCTA or Counties) from the US Census website, for a given year or for the closest one.

download

download_airnow.cwl

introspect

introspect.cwl

ingest

ingest.cwl

Uploads data into the database

index

index.cwl

vacuum

vacuum.cwl