Core Platform Python Modules

Package dorieh.platform

This package contains teh following modules used in subpackages:

  • Dorieh Core Package-wide utilities API to initialize logging and database connections

  • db is a PostgreSQL connection wrapper. It reads connection parameters from an ini file and connects to the database. It can transparently connect over ssh tunnel when required. See also Managing database connections

  • fips US State FIPS codes, represented as Python dictionary

  • pg_keywords PostgreSQL keywords, e.g. type names, etc.

Package dorieh.platform.data_model

APIs for data modelling, loading and manipulations. This subpackage focuses on generating code required to do the actual processing. It uses the same Medallion architecture paradigm as Databricks.

The main concept is a knowledge domain, or just a domain. Domain model is define in a YAML file as described in the documentation. A domain model defines a graph of data transformations using intermediate views, materialized views and tables (in this sense it is more flexible than Databricks DLT model that only uses materialized views).

The most important module that processes the YAML definition of the domain is domain.py. Another module, inserter handles parallel insertion of the data into domain tables.

Auxiliary modules perform various maintenance tasks. Module index_builder builds indices for a given tables or for all tables within a domain. Module utils provides convenience function wrappers and defines class DataReader that abstracts reading CSV and FST files. In other words, DataReader provides uniform interface to reading columnar files in two (potentially more) different formats.

Package dorieh.platform.loader

Command line utilities to manipulate data Implements parallel loading .

Package dorieh.platform.requests

Package dorieh.platform.requests contains some PoC-quality API that is intended to be used for fulfilling user requests. Its development is currently put on hold.

  • HDF5 Export API and utility to export result of quering the database in HDF5 format

  • Query class API and utility to generate SQL query based on a YAML query specification

Package dorieh.platform.utils

Miscellaneous tools and APIs