The loader_config Module

Domain Loader Configurator

Intended to configure loading of a single or a set of column-formatted files into NSAPH PostgreSQL Database. Input (aka source) files can be either in FST or in CSV format

Configurator assumes that the database schema is defined as a YAML or JSON file. A separate tool is available to introspect source files and infer possible database schema.

class Parallelization(value)[source]

An enumeration.

lines = 'lines'
files = 'files'
none = 'none'
class DataLoaderAction(value)[source]

An enumeration.

drop = 'drop'
load = 'load'
insert = 'insert'
print = 'print'
classmethod new(value: str)[source]
class LoaderConfig(doc)[source]

Configurator class for data loader

Creates a new object

Parameters:
  • subclass – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:

  • description – Optional text to use as description. If not specified, then it is extracted from subclass documentation

action: Optional[DataLoaderAction]

If this option is given, then the whole domain schema will be dropped

data

Path to a data file or directory. Can be a single CSV, gzipped CSV or FST file or a directory recursively containing CSV files. Can also be a tar, tar.gz (or tgz) or zip archive containing CSV files

reset

Force recreating table(s) if it/they already exist

page

Explicit page size for the database

log

Explicit interval for logging

limit

Load at most specified number of records

buffer

Buffer size for converting fst files

threads

Number of threads writing into the database

parallelization

Type of parallelization, if any

pattern

pattern for files in a directory or an archive, e.g., “**/maxdata_*_ps_*.csv”

incremental

Commit every file and skip over files that have already been ingested

sloppy

Do not update existing tables and views

validate(attr, value)[source]

Subclasses can override this method to implement custom handling of command line arguments

Parameters:
  • attr – Command line argument name

  • value – Value returned by argparse

Returns:

value to use