The loader_config Module
Domain Loader Configurator
Intended to configure loading of a single or a set of column-formatted files into NSAPH PostgreSQL Database. Input (aka source) files can be either in FST or in CSV format
Configurator assumes that the database schema is defined as a YAML or JSON file. A separate tool is available to introspect source files and infer possible database schema.
- class Parallelization(value)[source]
An enumeration.
- lines = 'lines'
- files = 'files'
- none = 'none'
- class DataLoaderAction(value)[source]
An enumeration.
- drop = 'drop'
- load = 'load'
- insert = 'insert'
- print = 'print'
- class LoaderConfig(doc)[source]
Configurator class for data loader
Creates a new object
- Parameters:
subclass¶ – A concrete class containing configuration information Configuration options must be defined as class memebers with names, starting with one ‘_’ characters and values be instances of :class Argument:
description¶ – Optional text to use as description. If not specified, then it is extracted from subclass documentation
- action: Optional[DataLoaderAction]
If this option is given, then the whole domain schema will be dropped
- data
Path to a data file or directory. Can be a single CSV, gzipped CSV or FST file or a directory recursively containing CSV files. Can also be a tar, tar.gz (or tgz) or zip archive containing CSV files
- reset
Force recreating table(s) if it/they already exist
- page
Explicit page size for the database
- log
Explicit interval for logging
- limit
Load at most specified number of records
- buffer
Buffer size for converting fst files
- threads
Number of threads writing into the database
- parallelization
Type of parallelization, if any
- pattern
pattern for files in a directory or an archive, e.g., “**/maxdata_*_ps_*.csv”
- incremental
Commit every file and skip over files that have already been ingested
- sloppy
Do not update existing tables and views