Internal scripts used for download tasks

Python module to download EPA AQS Data hosted at https://www.epa.gov/aqs

The module can be used as a library of functions to be called from other python scripts.

The data is downloaded from https://aqs.epa.gov/aqsweb/airdata/download_files.html

The tool adds a column containing a uniquely generated Monitor Key

Probably the only method useful to external user is download_aqs_data()

transfer(reader: DictReader, writer: DictWriter, flt=None, header: bool = True)[source]

Specific for EPA AQS Data

Rewrites the CSV content adding Monitor Key and optionally filtering rows by a provided list of parameter codes

Parameters:

reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
flt¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
header¶ – whether to first write header row

Returns:

Nothing

add_monitor_key(row: Dict)[source]

Internal method to generate and add unique Monitor Key

Parameters:: row¶ – a row of AQS CSV file
Returns:: Nothing, modifies the given row in place

download_data(task: DownloadTask)[source]

A utility method to download the content of given URL to the given file

Parameters:

url¶ – Source URL
target¶ – Target file path
parameters¶ – An optional list of EPA AQS Parameter codes to include in the output
append¶ – whether to append to an existing file

Returns:

Nothing

destination_path(destination: str, path: str) → str[source]

A utility method to construct destination file path

Parameters:

destination¶ – Destination directory
path¶ – Source path in URL

Returns:

Path on a file system

collect_annual_downloads(destination: str, path: str, contiguous_year_segment: List, parameters: List) → DownloadTask[source]

A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters

Parameters:

destination¶ – Destination directory for downloads
path¶ – path element
contiguous_year_segment¶ – a list of contiguous years taht can be saved in the same file
parameters¶ – List of EPA AQS Parameter codes
downloads¶ – The resulting collection of downloads that have to be performed

Returns:

downloads list

collect_daily_downloads(destination: str, ylabel: str, contiguous_year_segment: List, parameter) → DownloadTask[source]

A utility method to collect all URLs that should be downloaded for a given list of years and EPA AQS parameters

Parameters:

destination¶ – Destination directory for downloads
ylabel¶ – a label to use for years in the destination path
contiguous_year_segment¶ – a list of contiguous years taht can be saved in the same file
parameters¶ – List of EPA AQS Parameter codes
downloads¶ – The resulting collection of downloads that have to be performed

Returns:

downloads list

collect_aqs_download_tasks(context: AQSContext)[source]

Main entry into the library

Parameters:

aggregation¶ – Type of time aggregation: annual or daily
years¶ – a list of years to include, if None - then all years are included
destination¶ – Destination Directory
parameters¶ – List of EPA AQS Parameter codes. For annual aggregation can be empty, in which case all data is downloaded. Required for daily aggregation. Can contain either integer codes, or mnemonic instanced of Parameter Enum or both.
merge_years¶ –

Returns:

as_stream(url: str, extension: str = '.csv', params=None, mode=None)[source]

Returns the content of URL as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:

mode¶ – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’
params¶ – Optional. A dictionary, list of tuples or bytes to send as a query string.
url¶ – URL
extension¶ – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the URL or a zip entry

as_content(url: str, params=None, mode=None)[source]

Returns byte or text block with URL content

Parameters:

url¶ – URL
params¶ – Optional. A dictionary, list of tuples or bytes to send as a query string.
mode¶ – optional parameter to specify desirable return format: text or binary. Possible values: ‘t’ or ‘b’, default is binary

Returns:

Content of the URL

as_csv_reader(url: str, mode=None) → DictReader[source]

An utility method to return the CSV content of the URL as CSVReader

Parameters:: url¶ – URL
Returns:: an instance of csv.DictReader

file_as_stream(filename: str, extension: str = '.csv', mode=None)[source]

Returns the content of file as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:

mode¶ – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’
filename¶ – path to file
extension¶ – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the file or a zip entry

file_as_csv_reader(filename: str)[source]

An utility method to return the CSV content of the file as CSVReader

Parameters:: filename¶ – path to file
Returns:: an instance of csv.DictReader

fopen(path: str, mode: str)[source]

A wrapper to open various types of files

Parameters:

path¶ – Path to file
mode¶ – Opening mode

Returns:

file-like object

check_http_response(r: Response)[source]

An internal method raises an exception of HTTP response is not OK

Parameters:: r¶ – Response
Returns:: nothing, raises an exception if response is not OK

download(url: str, to: IO)[source]: A utility method to download large binary data to a file-like object

is_downloaded(url: str, target: str, check_size: int = 0) → bool[source]

Checks if the same data has already been downloaded

Parameters:

check_size¶ – Use default value (0) if target size should be equal to source size. If several urls are combined when downloaded then specify a positive integer to check that destination file size is greater than the specified value. Specifying negative value will disable size check
url¶ – URL with data
target¶ – Destination of the downloads

Returns:

True if the destination file exists and is newer than URL content

write_csv(reader: DictReader, writer: DictWriter, transformer=None, filter=None, write_header: bool = True)[source]

Rewrites the CSV content optionally transforming and filtering rows

Parameters:

transformer¶ – An optional callable that tranmsforms a row in place
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
filter¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
write_header¶ – whether to first write header row

Returns:

Nothing

class Collector[source]

class CSVWriter(out_stream)[source]

class ListCollector[source]

basename(path)[source]

Returns a name without extension of a file or an archive entry

Parameters:: path¶ – a path to a file or archive entry
Returns:: base name without full path or extension

is_readme(name: str) → bool[source]

Checks if a file is a documentation file This method is used to extract some metadata from documentation provided as markDOwn files

Parameters:: name¶ –
Returns:

get_entries(path: str) → Tuple[List, Callable][source]

Returns a list of entries in an archive or files in a directory

Parameters:: path¶ – path to a directory or an archive
Returns:: Tuple with the list of entry names and a method to open these entries for reading

get_readme(path: str)[source]: Looks for a README file in the specified path :param _sphinx_paramlinks_dorieh.utils.io_utils.get_readme.path: a path to a folder or an archive :return: a file that is possibly a README file

is_dir(path: str) → bool[source]

Determine if a certain path specification refers: to a collection of files or a single entry. Examples of collections are folders (directories) and archives

Parameters:: path¶ – path specification
Returns:: True if specification refers to a collection of files

class CSVFileWrapper(file_like_object, sep=',', null_replacement='NA')[source]

A wrapper around CSV reader that does:

Counts characters and line read
Logging of the progress of the file being read
Performs on-the-fly replacement of null and special values