The mcr_fts2db Module
Raw Data Loader for Medicare files provided by ResDac. NSAPH Medicare pipeline uses this module for years 2011 and later.
This module defines a command-line utility to ingest raw Medicare data delivered in File Transfer Summary (FTS) and fixed-width data (DAT) format, as provided by ResDAC for years 2011 and later.
Overview:
Searches recursively for all FTS (*.fts) files under specified input path(s) Parses each FTS file using the :class:~dorieh.cms.fts2yaml.MedicareFTS parser Determines the appropriate database schema and metadata for the associated *.dat or *.csv.gz file Loads data into the database using :class:~dorieh.cms.mcr_data_loader.MedicareDataLoader for .dat files or a generic :class:~dorieh.platform.loader.data_loader.DataLoader for CSV files Applies indexing and VACUUM optimization after insertion
Usage Notes:
This loader requires that data be organized into year-based subfolders. For example: my_data/medicare/2018/*.fts The name of the parent directory of the FTS file must be a 4-digit year (e.g., 2011, 2018). This requirement applies to both the data and FTS file location to establish table naming conventions correctly. Key Components:
:class:MedicareLoader — orchestrates ingestion logic :class:~dorieh.cms.mcr_data_loader.MedicareDataLoader — fixed-width reader-based data loader :class:~dorieh.platform.loader.data_loader.DataLoader — generic CSV reader-based loader
See also:
:doc:members/fts2yaml — for metadata extraction from FTS :doc:members/mcr_data_loader — for Medicare file reading :doc:members/medicare_yaml — for generated schema definition
- class MedicareLoader[source]
High-level loader for raw Medicare data files provided by ResDac, using FTS and DAT.
The loader walks the input directory to locate all *.fts (File Transfer Summary) files, and for each one:
Parses its metadata and adds to the schema registry (YAML)
- Uses
MedicareDataLoader
to load FWF files or
DataLoader
for CSV files
- Uses
Applies schema-specific indexing and vacuum optimization
This loader is compatible with ETL processing of Medicare data for 2011 and later.
Initializes MedicareLoader object with default CMS domain context.
Sets the input pattern and prepares the LoaderConfig context, including root directory, flags like incremental/sloppy, and path normalization.
- traverse(pattern: str)[source]
Searches directories recursively using the given pattern to find all FTS files. For each matching file, initiates schema inference and data ingestion via handle().
- handle_empty()[source]
Handles the case where no FTS files are found.
Creates an empty registry file (if not already present) and logs a message.
- handle(fts_path: str)[source]
Loads a Medicare FTS/DAT or FTS/CSV pair into the database.
Extracts the year based on the immediate parent directory of the FTS file
Determines the file type from FTS file name
Updates the schema registry
Dispatches to the appropriate loader (.dat or .csv.gz)
- Parameters:
fts_path¶ – Full path to an FTS metadata file.
Raises: ValueError: If year could not be inferred or data file is missing.
- static loader_for_csv(context: LoaderConfig, data_path: str) DataLoader [source]
Creates a generic DataLoader for a delimited CSV (usually .csv.gz) file.
- static loader_for_fwf(context: LoaderConfig, fts_path: str) DataLoader [source]
Creates a MedicareDataLoader instance for a FTS/DAT file pair.