The mcr_sas2yaml Module
This module defines an introspector for SAS 7BDAT files related to Medicare data.
The SASIntrospector class crawls a directory for SAS files matching a pattern, extracts metadata (column info), and generates data model definitions in YAML format. These models are written to a centralized registry.
Typical use case: building data models from Medicare SAS files (e.g., 1999–2010).
See Also:
- SASIntrospector
(this class)
- MedicareSAS
(superclass)
- Introspector
- MedicareRegistry
- class SASIntrospector(registry_path: str, root_dir: str = '.')[source]
This class traverses a file path looking for SAS .sas7bdat files, extracts their schema using
Introspector
, and creates a structured data model serialized to a YAML registry.In addition to field-level metadata, this introspector: - Attempts to identify common fields such as bene_id, state, zip, and year - Automatically generates a year column if missing - Marks special fields as indexed - Adds virtual key fields for file and record identifiers
- Inherits from:
MedicareSAS: For file traversal and handling utilities MedicareRegistry: For interacting with the data model YAML registry
Initializes the SASIntrospector with the given registry path and SAS root directory.
- Args:
registry_path (str): Path to the YAML registry file. root_dir (str): Base directory for SAS 7BDAT files.
- classmethod process(registry_path: str, pattern: str, root_dir: str = '.')[source]
Entry point that initializes and runs the introspector.
- Args:
registry_path (str): Path to output YAML registry file. pattern (str): Glob-like pattern to match .sas7bdat files. root_dir (str): Root directory to start searching. Default is current directory.
- classmethod matches(s: str, candidates: List[str])[source]
Determines whether a string matches any string or wildcard pattern in candidates.
- Args:
s (str): String to match. candidates (List[str]): List of exact names or patterns (may include *).
- Returns:
bool: True if s matches any candidate.
- handle(table: str, file_path: str, file_type: str, year: int)[source]
Handles metadata extraction for a single .sas7bdat file.
- Args:
table (str): Target table name to use in the registry. file_path (str): File path to the SAS data file. file_type (str): Type of file (e.g., ‘denominator’). year (int): Associated year of data.
- add_sas_table(table: str, file_path: str, index_all: bool, year: int)[source]
Extracts schema from a SAS file and registers columns into the YAML registry.
Uses introspection to extract columns and attach metadata.
Detects and indexes key columns (e.g., bene_id, state, year).
Auto-generates a ‘year’ column if missing using a virtual GENERATED column.
Indexes all columns if index_all is True (e.g., for denominator files).
Adds FILE and RECORD fields to simulate full uniqueness using a compound PK.
- Args:
table (str): Name of the table in the registry. file_path (str): Path to the SAS file. index_all (bool): Whether all fields should be indexed. year (int): Year to use when generating missing year columns.
- Raises:
ValueError: If duplicate key fields are detected or mandatory fields are missing.