Internal scripts used for download tasks
Python module to download EPA AQS Data hosted at https://www.epa.gov/aqs
The module can be used as a library of functions to be called from other python scripts.
The data is downloaded from https://aqs.epa.gov/aqsweb/airdata/download_files.html
The tool adds a column containing a uniquely generated Monitor Key
Probably the only method useful to external user is download_aqs_data()
- transfer(reader: DictReader, writer: DictWriter, flt=None, header: bool = True)[source]
Specific for EPA AQS Data
Rewrites the CSV content adding Monitor Key and optionally filtering rows by a provided list of parameter codes
- Parameters:
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
flt¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
header¶ – whether to first write header row
- Returns:
Nothing
- add_monitor_key(row: Dict)[source]
Internal method to generate and add unique Monitor Key
- Parameters:
row¶ – a row of AQS CSV file
- Returns:
Nothing, modifies the given row in place
- download_data(task: DownloadTask)[source]
A utility method to download the content of given URL to the given file
- destination_path(destination: str, path: str) str [source]
A utility method to construct destination file path
- collect_annual_downloads(destination: str, path: str, contiguous_year_segment: List, parameters: List) DownloadTask [source]
A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters
- Parameters:
- Returns:
downloads list
- collect_daily_downloads(destination: str, ylabel: str, contiguous_year_segment: List, parameter) DownloadTask [source]
A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters
- Parameters:
destination¶ – Destination directory for downloads
ylabel¶ – a label to use for years in the destination path
contiguous_year_segment¶ – a list of contiguous years taht can be saved in the same file
parameters¶ – List of EPA AQS Parameter codes
downloads¶ – The resulting collection of downloads that have to be performed
- Returns:
downloads list
- collect_aqs_download_tasks(context: AQSContext)[source]
Main entry into the library
- Parameters:
aggregation¶ – Type of time aggregation: annual or daily
years¶ – a list of years to include, if None - then all years are included
destination¶ – Destination Directory
parameters¶ – List of EPA AQS Parameter codes. For annual aggregation can be empty, in which case all data is downloaded. Required for daily aggregation. Can contain either integer codes, or mnemonic instanced of Parameter Enum or both.
merge_years¶ –
- Returns:
- as_stream(url: str, extension: str = '.csv', params=None, mode=None)[source]
Returns the content of URL as a stream. In case the content is in zip format (excluding gzip) creates a temporary file
- Parameters:
- Returns:
Content of the URL or a zip entry
- as_csv_reader(url: str, mode=None) DictReader [source]
An utility method to return the CSV content of the URL as CSVReader
- Parameters:
url¶ – URL
- Returns:
an instance of csv.DictReader
- file_as_stream(filename: str, extension: str = '.csv', mode=None)[source]
Returns the content of file as a stream. In case the content is in zip format (excluding gzip) creates a temporary file
- file_as_csv_reader(filename: str)[source]
An utility method to return the CSV content of the file as CSVReader
- Parameters:
filename¶ – path to file
- Returns:
an instance of csv.DictReader
- check_http_response(r: Response)[source]
An internal method raises an exception of HTTP response is not OK
- Parameters:
r¶ – Response
- Returns:
nothing, raises an exception if response is not OK
- download(url: str, to: IO)[source]
A utility method to download large binary data to a file-like object
- is_downloaded(url: str, target: str, check_size: int = 0) bool [source]
Checks if the same data has already been downloaded
- Parameters:
check_size¶ – Use default value (0) if target size should be equal to source size. If several urls are combined when downloaded then specify a positive integer to check that destination file size is greater than the specified value. Specifying negative value will disable size check
url¶ – URL with data
target¶ – Destination of the downloads
- Returns:
True if the destination file exists and is newer than URL content
- write_csv(reader: DictReader, writer: DictWriter, transformer=None, filter=None, write_header: bool = True)[source]
Rewrites the CSV content optionally transforming and filtering rows
- Parameters:
transformer¶ – An optional callable that tranmsforms a row in place
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
filter¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
write_header¶ – whether to first write header row
- Returns:
Nothing
- basename(path)[source]
Returns a name without extension of a file or an archive entry
- Parameters:
path¶ – a path to a file or archive entry
- Returns:
base name without full path or extension
- is_readme(name: str) bool [source]
Checks if a file is a documentation file This method is used to extract some metadata from documentation provided as markDOwn files
- Parameters:
name¶ –
- Returns:
- get_entries(path: str) Tuple[List, Callable] [source]
Returns a list of entries in an archive or files in a directory
- Parameters:
path¶ – path to a directory or an archive
- Returns:
Tuple with the list of entry names and a method to open these entries for reading
- get_readme(path: str)[source]
Looks for a README file in the specified path :param _sphinx_paramlinks_dorieh.utils.io_utils.get_readme.path: a path to a folder or an archive :return: a file that is possibly a README file