tuva-health / tuva Goto Github PK

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library

analytics-engineering bigquery data-analytics data-governance data-lineage data-pipelines data-warehouse dbt dbt-packages healthcare

tuva's Introduction

🧰 What is the Tuva Project?

The Tuva Project code base includes a core data model, data marts, terminology sets, and data quality tests for doing healthcare analytics.

Explore the project:

*Note: In many cases the actual terminology sets are too large to maintain on GitHub, so we host them in a public AWS S3 bucket. Executing dbt build will load the terminology sets from S3.

Check out our Quickstart guide here.

🔌 Supported Data Warehouses and dbt Versions

BigQuery
Databricks (community supported)
DuckDB (community supported)
Redshift
Snowflake

This package supports dbt version 1.3.x or higher.

🙋🏻‍♀️ Contributing

We created the Tuva Project to be a place where healthcare data practitioners can share their knowledge about doing healthcare analytics. If you have ideas for improvements or find bugs, we highly encourage and welcome feedback! Feel free to create an issue or ping us on Slack.

Check out our Contribution guide here.

🤝 Community

Join our growing community of healthcare data people in Slack!

tuva's People

Contributors

Stargazers

Watchers

tuva's Issues

Build new model CMS-HCC-V28 (payment year 2024)

Update seed and mapping files with new values
Update logic in int_hcc_mapping model to use new V28 flag column (needs a macro)
Add logic to create the blended risk score

Note: this list may not be complete yet.

Update ICD-10-PCS terminology file

The file we have now contains some codes that are three digits in length (e.g. "02N","Heart and Great Vessels, Release","Heart and Great Vessels, Release"). I don't believe these should be here. The CMS file does not contains these and procedure codes should be 7 digits in length.

Update schema to include claim_id on procedure tables in CCSR mart

Previously, the claim_id was not available for the procedure table in core. This has been corrected with the latest release of the Claims Preprocessing package.

Create high cost imaging data mart

Add Versioning to Terminology (actual datasets)

Forrest is still thinking through options here. To actually version the "full Tuva Project" we need this.

Add logic for institutional status to CMS HCC

The CMS HCC model includes a risk segment for "institutional status". This status comes from the LTI Flag on the MMR (monthly membership report). Basic payer eligibility data does not include this status.

From CMS Medicare Managed Care Manual:

To determine a beneficiary’s LTI status for payment purposes, CMS uses the reporting of
a 90-day assessment. This information is collected routinely from nursing homes, which
report to the States and CMS on at least a quarterly basis. This data is stored in the
Minimum Data Set (MDS). Payment at the long-term rate starts in the month following
the assessment date. Once persons are identified, they remain in long-term status until
discharged to the community for more than fourteen days. The costs of the short term
institutionalized (less than 90 days) are recognized in the community model.

Currently, we are using a default of No for institutional status. Add logic to the eligibility prep model to calculate this status.

Remove Old Packages from dbt Hub

We need to update dbt Hub so that only the packages we want users to adopt are on it. The only packages we want users to adopt are the following:

the_tuva_project
the_tuva_project_demo
medicare_cclf_connector
medicare_lds_connector

However dbt Hub currently lists a bunch of old packages:

Add LOINC to Terminology

Update CMS Chronic Conditions mart with new CMS version

CMS updated the chronic condition algorithms In February. CMS added diagnosis codes for Anemia, Diabetes, Rheumatoid Arthritis/Osteoarthritis, and OUD. CMS also added NDC codes for OUD.

Updated mapping docs added to Google Drive

To Do

Update the codes list for the above-listed chronic conditions.

Open Questions

Do we want to implement CMS's stricter logic of filtering claims?

Acute Inpatient Data Mart

We want to refactor the existing encounter grouper data mart into an acute inpatient data mart that does really interesting analytics around acute inpatient.

Add SNOMED-CT to Terminology

Add Staging Layer to Every Data Mart

We're adding a staging layer to every data mart to make it more obvious what models a user needs to create before they can run a specific data mart.

We're not adding a staging layer to core, data profiling, encounter grouper, or claim date profiling at this time.

The staging layer is a folder under the data mart folder in models called staging. The staging models merely select the columns needed for the data mart. Right now we are not doing any casting, filtering, or column renaming. We may do this in the future.

The naming convention for the staging models is "data mart__stg_model name" where model name is the name of the model that feeds the stage.

Add ATC to Terminology

Update CMS HCC mart to use OREC when added to the claims data model

Currently, OREC is being derived from medicare status. This does not catch all scenarios though. Once OREC has been added to the claims data model (see the_tuva_project #106), we can update logic in this mart.

DAG link in readme not working

Add Service Categories to PMPM

Currently PMPM only has limited service categories. Add all level 1 and level 2 service categories to PMPM to support financial analytics.

Add OREC to Claims and Core data models

Update the claims data model to include two new fields in the Eligibility model.

orec (Original Reason for Entitlement Code)
plan_id (Plan specific contract number, group number, etc.)

The following packages/projects will need to be updated to include these new fields.

the_tuva_project:
- integration_tests/Eligibility - source yml and claims data model definitions (this is what controls the data dictionaries on the website)
- core__eligibility
- data_profiling__eligibility_missing_values
medicare_lds_connector (map OREC from the MBSF)
medicare_cclf_connector (map BENE_ORGNL_ENTLMT_RSN_CD from CCLF8 File)
fhir_connector
the_tuva_project_demo (seeds)
CI Testing datasets

Add ICD-10-CM to SNOMED-CT Mapping to Terminology

Update Terminology Catalog

Several terminology sets are in the catalog but do not actually exist on the website. This includes: RBCS, CCSR, SNOMED-CT, LOINC, ATC, NDC, RxNorm, Medicare Specialty, and NUCC Taxonomy. When a user clicks one of the links for these in the catalog the user is re-directed back to the catalog page.
Maintainer is not filled out for demographic and geographic terminology sets.
Last updated dates are mostly blank
I would also love to add a search bar to the catalog if that's not too difficult

Refactor Tuva Repo

Data Marts: Transfer SQL and value sets to mono-repo for Readmissions, Chronic Conditions, and PMPM (include service category value set)
Configuration: Update configuration by using refs and setting schemas in models.yml and remove all variables that make it confusing.
Terminology: Any updates necessary to get terminology working. What about S3 API keys??
README: Update
CI/CD: Update
Rename Repo to Tuva
Make individual repos private

Add description to final models in service_category_grouper_model.yml

Provider Terminology Not Loading from S3

In the latest release of the Tuva Project, when you attempt to load provider and other provider taxonomy seed files, 0 records are loaded.

Add ED Classification Data Mart to Tuva

Add code and datasets to Tuva repo
Is there any code used for preprocessing of value sets that we need to store somewhere?
Update docs

$0 for all Institutional Claims

Dataset: Medicare LDS 5%

Add NYU ED classification algo

This logic has already been implemented in a separate repo here: https://github.com/tuva-health/ed_classification

It needs to be refactored to run off of the core data model (i.e. encounter and condition tables) so that it works for either claims or medical records. Right now it only works for claims.

Fix Service Category Assignment Issue for Institutional Claims

Institutional claims typically only have paid amounts at the header level. This means in the Tuva claims data model paid amounts will typically only exist on 1 line of an institutional claim. Currently service categories are assigned at the claim line level for institutional claims. Often the paid amounts associated with the service category will exist on a different claim line then the one assigned. Thus the current grouper logic excludes a large amount of paid amounts from being grouped into the proper service category.

CMS HCC Feature request: ability to calculate scores per data source and across data sources

From [email protected] via Slack:

On the third use case, this might be more specific to value based care organizations, but generally speaking if you're contracting with a payer, they want to know what you're contributing to their RAF. You'll also generally be receiving a regular claim feed from the payer for all claims for your attributed population. No papers I'm aware of - this is just from building out risk adjustment at VBC orgs

We do this by doing two calculation:

Calculate RAF including payer claims + our claims

Calculate RAF with just payer claims (excluding our claims)

Take the delta of the two, and that is your "unique RAF" contribution.

Basically you use this to make the case to the payer of the value you are creating.

Add the ED classification Mart

This has already been implemented as a separate repo here:

However the code should be modified to run off of the Core data model (i.e. encounter and condition rather than medical_claim).

Add logic for C-SNP enrollees to CMS HCC

The CMS HCC model includes a risk segment for new beneficiaries enrolled in "C-SNP" (Chronic Condition Special Needs Plan). We may be able to use the Chronic Conditions Mart to determine which new enrolled may fall into this status.

For a list of the conditions covered by the special needs plan, refer to Table 2-3 of this report.

Add HSA/HRR to Terminology

https://data.dartmouthatlas.org/supplemental/#hospital

Add RxNorm to Terminology

Instruct users to not use eligibility or medical claim in dbt_project.yml

In a connector, the dbt_project.yml cannot contain a var called eligibility or medical claim if it points to the source data. This will create a conflict with claims preprocessing. The ReadMe should be updated to reflect this.

Personal example:
In claims_preprocessing staging, there is a {{ var('eligibility')}} and {{ var('medical_claim')}}' reference in the model. In the dbt_project.yml of my connector under vars, I had eligibility: "{{ source(var('source'),'eligibility') }}". This resulted in {{ var('eligibility')}}` being liked to my source eligibility table and not final eligibility table I created for mapping.

Add CCSR Data Mart to Tuva

Transfer code and value sets to Tuva repo
Test on BigQuery and Redshift
If there is preprocessing for value sets, where should these live?
KB updates:
- update diagram
- add page under Data Marts

CMS Chronic Conditions encounter bug

Describe the bug

The Core/Encounters model was updated to include encounters for Acute Inpatient only. The CMS Chronic Conditions intermediate tables are still using the Encounter table. We need to remove this dependency so that chronic conditions can be categorized for all patients.

Add support for running model version CMS-HCC-V23 (payment year 2019)

Update seed and mapping files with new values
Update logic in int_hcc_mapping model to use new V23 flag column (needs a macro)

Note: this list may not be complete yet.

Fix CCSR Issue in Current Release

CCSR was never tested on BQ or RS. Aaron added it to the Tuva Project and merged it to main after mistakenly thinking all tests passed. As a result, the Tuva Project currently does not run on BQ or RS.

Reorganize models folder in Tuva Project package

Remove analytics and preprocessing parent folders. All marts should be listed under models at the root level.

Enhancement requests for FIPS terminology

Enhancement requests for terminology__fips_county. This would help with visualizations.

Split out the state fips code into a separate column
Add the full name for the state so that we can merge in datasets by the full state name
Add another FIPS file for zip code to county crosswalk

Release 3/29/23: Website 2.0

Definition of done:

Set up section complete - Thu Xuan

Setting up the Tuva Project - write up and video

Setting up Just Terminology - write up and video

Data Model section complete - Forrest

populate data dictionaries from YAML files

Terminology and Value sets section complete - Forrest

populate data dictionaries from YAML files

render seed files (small files only)

SQL scripts to load data from S3 into Snowflake, Redshift, and BigQuery

data shares available for Snowflake, Redshift, and BigQuery

i.e. any delivery mechanism

Not included in this release: new terminology data sets, databricks support (for Terminology SQL), review of every terminology set, maintenance process

Claims Data section complete - Aaron

Review and update sections on claims data as needed

Complete and release current code work on claims preprocessing; update website - Coco

Measures and Groupers section complete - Aaron

Review and update sections on released measures and groupers as needed

Announcement - Aaron

Tuva Slack - Announcements channel

Actual Release - Aaron and Forrest

Add unit tests and integration testing to the CMS HCC mart

The CMS HCC mart has been manually tested with various methods (e.g. manually calculated scores for a random sample of patients, ran the HCCpy test patients through and compared results). We need to build in better unit/integration testing to ensure that future changes continue to produce the expected results.

Unit tests - need to test smaller units of logic and mapping
Integration tests - full integration testing with a curated list of mock patients to test various risk factors (this may be complicated by the mart being added to the Tuva Project mono-repo)

Update MS-DRG

integrate new codes and deprecation dates from new data set here: https://www.cms.gov/medicare/medicare-fee-for-service-payment/acuteinpatientpps/ms-drg-classifications-and-software

Upgrade CI Testing

Convert to using Github Actions as opposed to dbt cloud (jobs can run in parallel, report success/failure of each environment individually, and not tied to one specific dbt cloud account).

Create one job for Snowflake
Create one job for Redshift
Create one job for BigQuery
Discuss with team creating jobs for each version of dbt for each environment

Adding CMS-HCC Data Mart to Tuva Project

Need to come up w/ analytics story for this data mart. Trending risk over time? Identifying under-coded patients?
Create summary data tables that tell the analytics story and add them to the data mart

Add inpatient/outpatient logic to CCSR

When the latest Tuva Core Data Model changes are released with Claims Preprocessing, add logic to map categories for inpatient vs outpatient based on encounter_type.

Add BETOS to Terminology

Add dataset to S3
Update SQL scripts for direct download
Update Tuva repo so it loads this dataset
Update Catalog

To Do

Update the Readmission Mart with the hospital-wide measure changes.

Open Questions

Do we want to maintain the previous version implemented and give the option to run either version or should we just update the measure to the latest version?
Do we want to add the other measures, "Condition-Specific" and "Procedure-Specific"? Currently, only "Hospital-Wide" has been implemented.

Add ability to enable/disable marts

Add ICD-9-CM diagnosis and procedure codes to terminology

We will need this soon for Adaptive work because their data is from Jan 2015 - Apr 2022 and ICD-10 switchover happened Oct 2015.