The mcr_fts2db Module

Raw Data Loader for Medicare files provided by ResDac. NSAPH Medicare pipeline uses this module for years 2011 and later.

This module defines a command-line utility to ingest raw Medicare data delivered in File Transfer Summary (FTS) and fixed-width data (DAT) format, as provided by ResDAC for years 2011 and later.

Overview:

Searches recursively for all FTS (*.fts) files under specified input path(s) Parses each FTS file using the :class:~dorieh.cms.fts2yaml.MedicareFTS parser Determines the appropriate database schema and metadata for the associated *.dat or *.csv.gz file Loads data into the database using :class:~dorieh.cms.mcr_data_loader.MedicareDataLoader for .dat files or a generic :class:~dorieh.platform.loader.data_loader.DataLoader for CSV files Applies indexing and VACUUM optimization after insertion

Usage Notes:

This loader requires that data be organized into year-based subfolders. For example: my_data/medicare/2018/*.fts The name of the parent directory of the FTS file must be a 4-digit year (e.g., 2011, 2018). This requirement applies to both the data and FTS file location to establish table naming conventions correctly. Key Components:

:class:MedicareLoader — orchestrates ingestion logic :class:~dorieh.cms.mcr_data_loader.MedicareDataLoader — fixed-width reader-based data loader :class:~dorieh.platform.loader.data_loader.DataLoader — generic CSV reader-based loader

See also:

:doc:members/fts2yaml — for metadata extraction from FTS :doc:members/mcr_data_loader — for Medicare file reading :doc:members/medicare_yaml — for generated schema definition

class MedicareLoader[source]

High-level loader for raw Medicare data files provided by ResDac, using FTS and DAT.

The loader walks the input directory to locate all *.fts (File Transfer Summary) files, and for each one:

  • Parses its metadata and adds to the schema registry (YAML)

  • Identifies corresponding *.dat or *.csv.gz data files

  • Uses MedicareDataLoader to load FWF files

    or DataLoader for CSV files

  • Applies schema-specific indexing and vacuum optimization

This loader is compatible with ETL processing of Medicare data for 2011 and later.

Initializes MedicareLoader object with default CMS domain context.

Sets the input pattern and prepares the LoaderConfig context, including root directory, flags like incremental/sloppy, and path normalization.

classmethod process()[source]
traverse(pattern: str)[source]

Searches directories recursively using the given pattern to find all FTS files. For each matching file, initiates schema inference and data ingestion via handle().

Parameters:

pattern – pattern (str): Glob pattern to match files (e.g., “**/*.fts”)

Returns:

handle_empty()[source]

Handles the case where no FTS files are found.

Creates an empty registry file (if not already present) and logs a message.

handle(fts_path: str)[source]

Loads a Medicare FTS/DAT or FTS/CSV pair into the database.

  • Extracts the year based on the immediate parent directory of the FTS file

  • Determines the file type from FTS file name

  • Updates the schema registry

  • Dispatches to the appropriate loader (.dat or .csv.gz)

Parameters:

fts_path – Full path to an FTS metadata file.

Raises: ValueError: If year could not be inferred or data file is missing.

static loader_for_csv(context: LoaderConfig, data_path: str) DataLoader[source]

Creates a generic DataLoader for a delimited CSV (usually .csv.gz) file.

Parameters:
  • context – Configuration object with metadata and paths

  • data_path – Path to the input CSV file

Returns:

Configured loader for tab-delimited CSV

static loader_for_fwf(context: LoaderConfig, fts_path: str) DataLoader[source]

Creates a MedicareDataLoader instance for a FTS/DAT file pair.

Parameters:
  • context – Configuration object with metadata and paths

  • fts_path – Path to the associated FTS metadata file

Returns:

Loader ready to ingest fixed-width records