The io_utils Module

sizeof_fmt(num, suffix='B') str[source]
class DownloadTask(destination: str, urls: Optional[List] = None, metadata=None)[source]
add_url(url: str)[source]
reset()[source]
is_up_to_date(is_transformed: bool = True)[source]
as_stream(url: str, extension: str = '.csv', params=None, mode=None)[source]

Returns the content of URL as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:
  • mode – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’

  • params – Optional. A dictionary, list of tuples or bytes to send as a query string.

  • url – URL

  • extension – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the URL or a zip entry

as_content(url: str, params=None, mode=None)[source]

Returns byte or text block with URL content

Parameters:
  • url – URL

  • params – Optional. A dictionary, list of tuples or bytes to send as a query string.

  • mode – optional parameter to specify desirable return format: text or binary. Possible values: ‘t’ or ‘b’, default is binary

Returns:

Content of the URL

as_csv_reader(url: str, mode=None) DictReader[source]

An utility method to return the CSV content of the URL as CSVReader

Parameters:

url – URL

Returns:

an instance of csv.DictReader

file_as_stream(filename: str, extension: str = '.csv', mode=None)[source]

Returns the content of file as a stream. In case the content is in zip format (excluding gzip) creates a temporary file

Parameters:
  • mode – optional parameter to specify desirable mode: text or binary. Possible values: ‘t’ or ‘b’

  • filename – path to file

  • extension – optional, when the content is zip-encoded, the extension of the zip entry to return

Returns:

Content of the file or a zip entry

file_as_csv_reader(filename: str)[source]

An utility method to return the CSV content of the file as CSVReader

Parameters:

filename – path to file

Returns:

an instance of csv.DictReader

fopen(path: str, mode: str)[source]

A wrapper to open various types of files

Parameters:
  • path – Path to file

  • mode – Opening mode

Returns:

file-like object

check_http_response(r: Response)[source]

An internal method raises an exception of HTTP response is not OK

Parameters:

r – Response

Returns:

nothing, raises an exception if response is not OK

download(url: str, to: IO)[source]

A utility method to download large binary data to a file-like object

is_downloaded(url: str, target: str, check_size: int = 0) bool[source]

Checks if the same data has already been downloaded

Parameters:
  • check_size – Use default value (0) if target size should be equal to source size. If several urls are combined when downloaded then specify a positive integer to check that destination file size is greater than the specified value. Specifying negative value will disable size check

  • url – URL with data

  • target – Destination of the downloads

Returns:

True if the destination file exists and is newer than URL content

write_csv(reader: DictReader, writer: DictWriter, transformer=None, filter=None, write_header: bool = True)[source]

Rewrites the CSV content optionally transforming and filtering rows

Parameters:
  • transformer – An optional callable that tranmsforms a row in place

  • reader – Input data as an instance of csv.DictReader

  • writer – Output source should be provided as csv.DictWriter

  • filter – Optionally, a callable function returning True for rows that should be written to the output and False for those that should be omitted

  • write_header – whether to first write header row

Returns:

Nothing

count_lines(f)[source]
class Collector[source]
abstract writerow(data: List)[source]
flush()[source]
class CSVWriter(out_stream)[source]
writerow(row: List)[source]
flush()[source]
class ListCollector[source]
writerow(data: List)[source]
get_result()[source]
as_dict(json_or_yaml_file: str) dict[source]
basename(path)[source]

Returns a name without extension of a file or an archive entry

Parameters:

path – a path to a file or archive entry

Returns:

base name without full path or extension

is_readme(name: str) bool[source]

Checks if a file is a documentation file This method is used to extract some metadata from documentation provided as markDOwn files

Parameters:

name

Returns:

get_entries(path: str) Tuple[List, Callable][source]

Returns a list of entries in an archive or files in a directory

Parameters:

path – path to a directory or an archive

Returns:

Tuple with the list of entry names and a method to open these entries for reading

get_readme(path: str)[source]

Looks for a README file in the specified path :param _sphinx_paramlinks_dorieh.utils.io_utils.get_readme.path: a path to a folder or an archive :return: a file that is possibly a README file

is_dir(path: str) bool[source]
Determine if a certain path specification refers

to a collection of files or a single entry. Examples of collections are folders (directories) and archives

Parameters:

path – path specification

Returns:

True if specification refers to a collection of files

is_yaml_or_json(path: str) bool[source]
fst2csv(path: str, buffer_size=10000)[source]
dataframe2csv(df: DataFrame, dest: str, append: bool)[source]
class SpecialValues[source]
NA = 'NA'
NaN = 'NaN'
classmethod is_missing(v) bool[source]
classmethod is_untyped(v) bool[source]
class CSVFileWrapper(file_like_object, sep=',', null_replacement='NA')[source]

A wrapper around CSV reader that does:

  • Counts characters and line read

  • Logging of the progress of the file being read

  • Performs on-the-fly replacement of null and special values