The io_utils Module

sizeof_fmt(num, suffix='B') → str[source]

class DownloadTask(destination: str, urls: Optional[List] = None, metadata=None)[source]

add_url(url: str)[source]

reset()[source]

is_up_to_date(is_transformed: bool = True)[source]

as_stream(url: str, extension: str = '.csv', params=None, mode=None)[source]

Returns the content of URL as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:

mode¶ – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’
params¶ – Optional. A dictionary, list of tuples or bytes to send as a query string.
url¶ – URL
extension¶ – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the URL or a zip entry

as_content(url: str, params=None, mode=None)[source]

Returns byte or text block with URL content

Parameters:

url¶ – URL
params¶ – Optional. A dictionary, list of tuples or bytes to send as a query string.
mode¶ – optional parameter to specify desirable return format: text or binary. Possible values: ‘t’ or ‘b’, default is binary

Returns:

Content of the URL

as_csv_reader(url: str, mode=None) → DictReader[source]

An utility method to return the CSV content of the URL as CSVReader

Parameters:: url¶ – URL
Returns:: an instance of csv.DictReader

file_as_stream(filename: str, extension: str = '.csv', mode=None)[source]

Returns the content of file as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:

mode¶ – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’
filename¶ – path to file
extension¶ – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the file or a zip entry

file_as_csv_reader(filename: str)[source]

An utility method to return the CSV content of the file as CSVReader

Parameters:: filename¶ – path to file
Returns:: an instance of csv.DictReader

fopen(path: str, mode: str)[source]

A wrapper to open various types of files

Parameters:

path¶ – Path to file
mode¶ – Opening mode

Returns:

file-like object

check_http_response(r: Response)[source]

An internal method raises an exception of HTTP response is not OK

Parameters:: r¶ – Response
Returns:: nothing, raises an exception if response is not OK

download(url: str, to: IO)[source]: A utility method to download large binary data to a file-like object

is_downloaded(url: str, target: str, check_size: int = 0) → bool[source]

Checks if the same data has already been downloaded

Parameters:

check_size¶ – Use default value (0) if target size should be equal to source size. If several urls are combined when downloaded then specify a positive integer to check that destination file size is greater than the specified value. Specifying negative value will disable size check
url¶ – URL with data
target¶ – Destination of the downloads

Returns:

True if the destination file exists and is newer than URL content

write_csv(reader: DictReader, writer: DictWriter, transformer=None, filter=None, write_header: bool = True)[source]

Rewrites the CSV content optionally transforming and filtering rows

Parameters:

transformer¶ – An optional callable that tranmsforms a row in place
reader¶ – Input data as an instance of csv.DictReader
writer¶ – Output source should be provided as csv.DictWriter
filter¶ – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted
write_header¶ – whether to first write header row

Returns:

Nothing

count_lines(f)[source]

class Collector[source]

abstract writerow(data: List)[source]

flush()[source]

class CSVWriter(out_stream)[source]

writerow(row: List)[source]

flush()[source]

class ListCollector[source]

writerow(data: List)[source]

get_result()[source]

as_dict(json_or_yaml_file: str) → dict[source]

basename(path)[source]

Returns a name without extension of a file or an archive entry

Parameters:: path¶ – a path to a file or archive entry
Returns:: base name without full path or extension

is_readme(name: str) → bool[source]

Checks if a file is a documentation file This method is used to extract some metadata from documentation provided as markDOwn files

Parameters:: name¶ –
Returns:

get_entries(path: str) → Tuple[List, Callable][source]

Returns a list of entries in an archive or files in a directory

Parameters:: path¶ – path to a directory or an archive
Returns:: Tuple with the list of entry names and a method to open these entries for reading

get_readme(path: str)[source]: Looks for a README file in the specified path :param _sphinx_paramlinks_dorieh.utils.io_utils.get_readme.path: a path to a folder or an archive :return: a file that is possibly a README file

is_dir(path: str) → bool[source]

Determine if a certain path specification refers: to a collection of files or a single entry. Examples of collections are folders (directories) and archives

Parameters:: path¶ – path specification
Returns:: True if specification refers to a collection of files

is_yaml_or_json(path: str) → bool[source]

fst2csv(path: str, buffer_size=10000)[source]

dataframe2csv(df: DataFrame, dest: str, append: bool)[source]

class SpecialValues[source]

NA = 'NA'

NaN = 'NaN'

classmethod is_missing(v) → bool[source]

classmethod is_untyped(v) → bool[source]

class CSVFileWrapper(file_like_object, sep=',', null_replacement='NA')[source]

A wrapper around CSV reader that does:

Counts characters and line read
Logging of the progress of the file being read
Performs on-the-fly replacement of null and special values