# Pipeline to ingest Monthly Pollution data downloaded from WashU Box [Source code](wustlcwl_src.md) ![](wustl.png) ```{contents} --- local: --- ``` **Workflow** ## Description Workflow to aggregate pollution data coming in NetCDF format over given geographies (zip codes or counties) and ingest the aggregated data into the database ## Inputs | Name | Type | Default | Description | |:----------------------|:-----------|:---------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------| | proxy | string? | | HTTP/HTTPS Proxy if required | | shapes | Directory? | | Do we even need this parameter, as we isntead downloading shapes? | | shape_file_collection | string | `tiger` | [Collection of shapefiles](https://www2.census.gov/geo/tiger), either GENZ or TIGER | | downloads | Directory | | Directory, containing files, downloaded and unpacked from WUSTL box | | geography | string | | Type of geography: zip codes or counties Valid values: "zip" or "county" | | years | int[] | `[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]` | | | months | int[] | `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]` | | | band | string | `pm25` | | | strategy | string | `downscale` | Rasterization strategy | | ram | string | `2GB` | Runtime memory, available to the process | | database | File | | Path to database connection file, usually database.ini | | connection_name | string | | The name of the section in the database.ini file | ## Outputs | Name | Type | Description | |:--------------|:------|:------------| | data | array | | | aggregate_log | array | | | aggregate_err | array | | | ingest_log | array | | | ingest_err | array | | | reset_log | File | | | reset_err | File | | | index_log | File | | | index_err | File | | | vacuum_log | File | | | vacuum_err | File | | ## Steps | Name | Runs | Description | |:----------------|:----------------------------------------|:----------------------------------------------------------------------| | initdb | [initdb.cwl](initdb.md) | Ensure that database utilities are at their latest version | | make_table_name | Evaluates JavaScript expression | Given variable and geography type (zip/county) evaluates table name | | init_tables | [reset.cwl](reset.md) | creates or recreates database tables, one for each band and geography | | process | [wustl_one_year.cwl](wustl_one_year.md) | Downloads raw data and aggregates it over shapes and time | | index | [index.cwl](index.md) | | | vacuum | [vacuum.cwl](vacuum.md) | |