Full EPA AirNow Processing Pipeline (including downloading shapefiles)
Workflow
Description
This workflow downloads AirNow data from the government servers, introspects it to infer the database schema and ingests the data into the database
Example run:
cwl-runner airnow.cwl sample_airnow.yml
Or
cwl-runner --parallel /opt/airflow/project/epa/src/cwl/airnow.cwl --database /opt/airflow/project/database.ini --connection_name nsaph2 --proxy $HTTP_PROXY --api-key XXXXXXXX-YYYY-ZZZZ-XXXX-YYYYYYYYY --from 2022-01-01 --to 2022-08-31 --parameter_code pm25 --table airnow_pm25_2022
Inputs
Name |
Type |
Default |
Description |
---|---|---|---|
proxy |
string? |
HTTP/HTTPS Proxy if required |
|
api-key |
string |
API key for AirNow |
|
database |
File |
Path to database connection file, usually database.ini |
|
connection_name |
string |
The name of the section in the database.ini file |
|
from |
string |
Start date for downolading, in YYYY-MM-DD format |
|
to |
string |
End date for downolading, in YYYY-MM-DD format |
|
parameter_code |
string |
Parameter code. Either a numeric code (e.g. 88101, 44201) or symbolic name (e.g. PM25, NO2). See more: AQS Code List |
|
table |
string |
Name of the table to be created in the database |
|
year |
int |
Outputs
Name |
Type |
Description |
---|---|---|
shapes_data |
File[] |
|
download_log |
File |
|
ingest_log |
File |
|
index_log |
File |
|
vacuum_log |
File |
|
download_data |
File |
|
model |
File |
Steps
Name |
Runs |
Description |
---|---|---|
get_shapes |
This step downloads Shape files from a given collection (TIGER/Line or GENZ) and a geography (ZCTA or Counties) from the US Census website, for a given year or for the closest one. |
|
download |
||
introspect |
||
ingest |
Uploads data into the database |
|
index |
||
vacuum |