# Dorieh CMS Package (manipulating with Health Data) **Pipelines to process CMS data: Medicaid and Medicare** ```{toctree} --- maxdepth: 4 glob: true hidden: --- Medicaid.md Medicare.md MedicareLineage.md QueringMedicaid.md ``` ```{contents} --- local: --- ``` ## Overview of health data (Medicare and Medicaid) We use health data provided by [Centers for Medicare & Medicaid Services (CMS)](https://www.cms.gov/) Data processing pipelines included in this package create a data warehouse with health data (Medicare and Medicaid). They perform ingestion of raw data into the database, data cleansing and deduplication , when possible, data quality analysis and optimization of the tables for efficient queries. Please see the following documents for details: * Data model and processing of [Medicaid](Medicaid.md) data * Data model and processing of [Medicare](Medicare.md) data * Tips on [querying of Medicaid data](QueringMedicaid.md) Medicare processing now includes a [pipeline to automatically create QC Tables](Medicare.md#creating-qc-tables). These tables are used by Apache Superset dashboard that visualizes QC results. ## Project Structure Top level directories are: - doc - src Doc directory contains documentation. Src directory contains software source code. The directories under sources are: - cwl - python ### CWL CWL folder contains reusable workflows, packaged as tools that can and should be used by all Dorieh pipelines. Each processing step of CMS data is packaged as a standalone tool that can be run individually. Each tool is individually documented. The tools are combined into a workflow represented by [medicaid.cwl](pipeline/medicaid) and [medicare.cwl](pipeline/medicare) files. ### Python Python packages and modules are described in the [Python Package Description](CMSLibrary.md) document. Included are utilities to: * Parse FTS format and generate database schema ### Data Model for health data The data model in YAML format is used to generate database schema and processing code to ingest data into the database. Read more about the modeling in the [Data Modeling](Datamodels.md). The model for raw data is automatically generated by parsing FTS files or analyzing SAS data. The following models are defined here: * [Medicaid processed data](members/medicaid_yaml.md). See also [](Medicaid.md) * Tables * `medicaid.beneficiaries` [details](Medicaid.md#beneficiaries) * `medicaid.enrollments` [details](Medicaid.md#enrollments) * `medicaid.eligibility` [details](Medicaid.md#eligibility) * `medicaid.admissions` [details](Medicaid.md#inpatient-admissions) * SQL Views, used internally for data processing * `medicaid.monthly` * `medicaid._eligibility` * [Medicare processed data](members/medicare_yaml.md). See also [](Medicare.md) * Tables * `medicare.beneficiaries` [details](Medicare.md#creating-beneficiaries-table) * `medicare.enrollments` [details](Medicare.md#creating-enrollments-table) * `medicare.admissions` [details](Medicare.md#creating-inpatient-admissions-table) * SQL Views, used internally for data processing * `medicare.ps` [Combined raw data for patient summaries](Medicare.md#creating-federated-patient-summary) * `medicare.` [Combined raw data for inpatient admissions](Medicare.md#creating-federated-admissions-view) * `medicare._ps` * `medicare._beneficiaries` * `medicare._enrollments` ### SQL File [procedures](members/procedures.md) addresses the problem that creating [Medicaid eligibility table](Medicaid.md#eligibility) in a single transaction requires too much time and memory. The stored procedures in this file split populating this table with data either by beneficiary or by year and state. Splitting by beneficiary (i.e. using one database transaction per beneficiary) works best. File [functions](members/functions.md) contain helper functions to parse dates in non-standard formats that are encountered in raw medicare files that we have. (cms-indices-and-tables)= ## Documentation Indices * [](genindex) * [](modindex)