The Data Dictionary Generation tool

Generating data dictionary

In Dorieh, one can define a data model, including tables (or views and materialized views), columns, indices and relations between tables (foreign keys) using Dorieh DSL. The DSL also describes how the original incoming data should be transformed to create the eventual data structure.

The Data dictionary tool generates documentation for the data elements (such as tables and columns) in the data model and data lineage diagrams at the tables levels and at column levels for every column.

The output of the tool is described below.

Domain Dictionary Output

  • Main table-level data lineage diagram showing the order of the data processing and the dependencies between tables.

  • If the diagram is generated using SVG format, then every table is clickable, linked to a file with the table description.

  • Every table description file includes verbal description, SQL or DDL used to create the table and the list of all columns in the table. Each column is linked to another file with detailed description for this column.

  • Each column description file contains a description of the column and a lineage diagram for the column showing what columns in which tables have been used to compute the value of this column. The SVG diagram is clickable and every element is linked to the description file for the column.

  • File, containing alphabetical list of all columns in all tables. For every column a list of tables in which the column is present is displayed. During the transformation process columns are transferred from one table to another, hence a column usually is present in multiple tables.

The tool first generates Markdown files that can be subsequently converted to HTML.

There are two modes for Markdown generation:

  • Standalone mode, when Pandoc is used to generate HTML

  • Sphinx mode designed to produce files that will be included in the Sphinx generated documentation

Usage

python -m dorieh.platform.dictionary.domain_dictionary
    [-h] [--fmt {none,png,gif,ps2,svg,cmapx,jpeg}]
    [--lod {full,none,min}]
    [--mode {standalone,sphinx}]
    [--output OUTPUT]
    yaml [yaml ...]

Positional arguments:

Positional Arguments

Argument

Description

yaml

Paths to YAML files with domain definitions

Options:

Options

Option

Alias

Description

--help

-h

Show this help message and exit

--fmt {none,png,gif,ps2,svg,cmapx,jpeg}

-f {none,png,gif,ps2,svg,cmapx,jpeg}

Format of generated image, if ‘none’, then no image is generated

--lod {full,none,min}

Level of details

--mode {standalone,sphinx}

Documentation generation mode

--output OUTPUT

--of OUTPUT, -o OUTPUT

Path to the main output file with Table-level data lineage diagram

Details

class DomainDict(of: str, options: Dict)[source]
add(path)[source]
list()[source]
is_top(t: str)[source]
print_node(out, t: str, indent: int)[source]
print_top_nodes(out)[source]
print_other_nodes(out)[source]
to_dot(out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]
html(of: str, svg=None)[source]
static html_body(svg)[source]
write_markdown(content: str, of: str)[source]
markdown(of: str, svg=None)[source]
table_toctree(extra: Optional[List[str]] = None) str[source]
table_list(of: str)[source]
column_list(of: str)[source]
generate_graphs()[source]
class LOD(value)[source]

Level of details to include in the generated pages

full = 'full'
none = 'none'
min = 'min'
parse_args() Dict[source]