ohdsi / mimic Goto Github PK

MIMIC (Medical Information Mart for Intensive Care) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. This repository contains the ETL to the OMOP CDM.

License: Apache License 2.0

Python 94.65% PLpgSQL 3.37% Shell 1.98%

mimic's Introduction

MIMIC IV to OMOP CDM Conversion

(Demo data link: https://doi.org/10.13026/p1f5-7x35)

The project implements an ETL conversion of MIMIC IV PhysioNet dataset to OMOP CDM format.

Version 1.0

Concepts / Phylosophy

######The ETL is based on the five logic steps.######

Create a snapshot of the source data. The snapshot data is stored in staging source tables with prefix "src_".
Clean source data: filter out rows to be not used, format values, apply some business rules. This step results in creating "clean" intermediate tables with prefix "lk_" and suffix "clean".
Map distinct source codes to concepts in vocabulary tables. The step results in creating intermediate tables with prefix "lk_" and suffix "concept".
- Custom mapping is implemented in custom concepts generated in vocabulary tables beforehand.
Join cleaned data and mapped codes. The step results in creating intermediate tables with prefix "lk_" and suffix "mapped".
Distribute mapped data by target CDM tables according to target_domain_id values.

######The ETL process encapsulates the following workflows:######

vocabulary refresh, which loads vocabulary and custom mapping data from local folders to the vocabulary dataset.
waveform collecting, which loads parsed waveform data from google storage bucket to the waveform staging dataset.
ddl, which creates empty cdm tables in the ETL dataset.
staging, which creates a snapshot of the source tables and vocabulary tables in the ETL dataset.
etl, which performs the ETL logic.
ut, which runs internal unit tests.
metrics, which builds metric report data for internal QA.
unload, which copies CDM and vocabulary tables to the final CDM OMOP dataset.
On the POC level waveform collecting workflow is represented by a single script scripts/wf_read.py, which iterates through subfolders in the given bucket path. It populates wf_header table with folder structure data, and wf_details table with data from CSV files found there. These tables are used as source tables for poc_2 unit in measurement and condition_occurrence tables. POC_3 data is loaded and prepared manually (todo: provide both manual scripts)
All workflows from ddl to metrics operate with so called "ETL" dataset, where intermediate tables are created, and all tables have prefixes according to their roles. I.e. voc for vocabulary tables, src for snapshot of source tables, lk for intermediate aka lookup tables, cdm for target CDM tables. Most of the tables have additional fields: unit_id, load_table_id, load_row_id, trace_id.
The last step, unload, populates the final OMOP CDM dataset, also referred as "ATLAS" dataset. Only CDM and vocabulary tables are kept here, prefixes and additional fields are removed. The final OMOP CDM dataset can be analysed with OHDSI tools as ATLAS or DQD.

How to run the conversion

######To run ETL end-to-end:######

load the latest standard OMOP vocabularies from http://athena.ohdsi.org if needed
- create a working copy of the loaded vocabularies, where custom mapping data will be added to
set variables in vocabulary_refresh/README.md
- run vocabulary refresh commands given below from directory "vocabulary_refresh"
set the project variables in conf/*.etlconf
- run script "wf_read" to load waveform sample data if needed
- run workflow commands below in the given sequence
  - in the workflow commands is the "environment" name, which equals "dev" for the demo dataset and "full" for the full set
set the project root (location of this file) as the current directory

cd vocabulary_refresh
python vocabulary_refresh.py -s10
python vocabulary_refresh.py -s20
python vocabulary_refresh.py -s30
cd ../
python scripts/wf_read.py -e conf/<env>.etlconf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_ddl.conf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_staging.conf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_etl.conf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_ut.conf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_metrics.conf
python scripts/run_workflow.py -e conf/<env>.etlconf -c conf/workflow_unload.conf

######To look at UT and Metrics reports:######

see metrics dataset name in the corresponding .etlconf file

-- Metrics - row count
SELECT * FROM metrics_dataset.me_total ORDER BY table_name;
-- Metrics - person and visit summary
SELECT
    category, name, count AS row_count
FROM metrics_dataset.me_persons_visits ORDER BY category, name;
-- Metrics - Mapping rates
SELECT
    table_name, concept_field, 
    count   AS rows_mapped, 
    percent AS percent_mapped,
    total   AS rows_total
FROM metrics_dataset.me_mapping_rate 
ORDER BY table_name, concept_field
;
-- Metrics - Mapped and Unmapped source values
SELECT 
    table_name, concept_field, category, source_value, concept_id, concept_name,
    count       AS row_count,
    percent     AS rows_percent
FROM metrics_dataset.me_tops_together 
ORDER BY table_name, concept_field, category, count DESC;
-- UT report
SELECT report_starttime, table_id, test_type, field_name, test_passed
FROM mimiciv_full_metrics_2023_02_17.report_unit_test
order by table_id, report_starttime
-- WHERE NOT test_passed 
;

######More option to run ETL parts:######

Run a workflow:
- with local variables: python scripts/run_workflow.py -c conf/workflow_etl.conf
  - copy "variables" section from file.etlconf
- with global variables: python scripts/run_workflow.py -e conf/dev.etlconf -c conf/workflow_etl.conf
Run explicitly named scripts (space delimited): python scripts/run_workflow.py -e conf/dev.etlconf etl/etl/cdm_drug_era.sql
Run in background: nohup python scripts/run_workflow.py -e conf/full.etlconf -c conf/workflow_etl.conf > ../out_full_etl.out &
Continue after an error: nohup python scripts/run_workflow.py -e conf/full.etlconf -c conf/workflow_etl.conf etl/etl/cdm_observation.sql etl/etl/cdm_observation_period.sql etl/etl/cdm_fact_relationship.sql etl/etl/cdm_condition_era.sql etl/etl/cdm_drug_era.sql etl/etl/cdm_dose_era.sql etl/etl/cdm_cdm_source.sql >> ../out_full_etl.out &

Change Log (latest first)

2023-02-17

MIMIC 2.2 is issued. Run ETL on MIMIC 2.2.
minor change to measurement.value_source_value:
- populate the field always instead of populating only when value_as_number is null
minor change to custom mapping vocabularies:
- mimiciv_drug_ndc,
- mimiciv_drug_route,
- mimiciv_meas_lab_loinc,
- mimiciv_obs_drgcodes,
- mimiciv_proc_itemid
run with OMOP vocabularies v16-JAN-23

2022-09-09

MIMIC 2.0 is issued: run ETL on MIMIC 2.0.
scripts/bq_run_script.py is updated to fit current BQ requirement of single line queries
minor changes in the ETL code to match MIMIC 2.0
- @core_dataset now points to @hosp_dataset
- table d_micro is no longer available in physionet-data. A replacement src_d_micro is generated from microbiologyevents. (see "z_more/MIMIC 2.0 affected tables.sql", etl/staging/st_hosp.sql)

2021-05-17

etl/unload/*extra.sql is added
custom mapping review scripts are added to crosswalk_csv folder
bugfixes

2021-03-08

Date shift bugfix
- cdm_person - birth_year is updated
- lk_procedure - events earlier than one year before birth_year are filtered out

2021-02-22

All events tables
- review and re-map type_concept_id
Specimen
- due to target domain in custom mapping some specimen are put to Procedure and Observation
Procedure
- a bug is fixed in lk_hcpcs_concept table. HCPCS procedures are in the target table now
Measurement
- trim values in chartevents
- pick antibiotic resistance custom vocabulary
Observation
- add 'mimiciv_obs_language' custom vocaulary (just english, but it covers 17% of observation rows)
Dose_era
- a bug is fixed. Now dose_era is populated properly
Custom mapping is added and updated
- for visit_detail, specimen, measurement, drug, units, waveforms

2021-02-08

Set version v.1.0
Drug_exposure table
- pharmacy.medication is replacing particular values of prescription.drug
- source value format is changed to COALESCE(pharmacy.medication.selected, prescription.drug) || prescription.prod_strength
Labevents mapping is replaced with new reviewed version
- vocabulary affected: mimiciv_meas_lab_loinc
- lk_meas_labevents_clean and lk_meas_labevents_mapped are changed accordingly
Unload for Atlas
- Technical fields unit_id, load_row_id, load_table_id, trace_id are removed from Atlas devoted tables
Delivery export script
- tables are exported to a single directory and single files. If a table is too large, it is exported to multiple files
Bugfixes and cleanup
Real environmental names are replaced with placeholders

2021-02-01

Waveform POC-2 is created for 4 MIMIC III Waveform files uploaded to the bucket
- iterate through the folders tree, capture metadata and load the CSVs
Bugfixes

2021-01-25

Mapping improvement
New visit logic for measurement (picking visits by event datetime)
Bugfixes

2021-01-13

Export and import python scripts:
- scripts/delivery/export_from_bq.py
  - the script is adjusted to export BQ tables from Atlas dataset to GS bucket, and optionally to the local storage to CSV files
- scripts/delivery/load_to_bq.py
  - the script is adjusted to load the exported CSVs from local storage or from the bucket to the given target BQ dataset, created automatically

2020-12-14

TUF-55 Custom mapping derived from MIMIC III
- A little more custom mapping
TUF-77 UT, QA and Metrics for tables with basic logic
- UT for all tables:
  - Unique fields
  - FK to person_id, visit_occurrence_id, some individual FK

2020-12-03

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- Bugfix: Visit and Visit_detail FK violation - fixed
TUF-55 Custom mapping derived from MIMIC III
- Measurement.value_as_concept_id - mapped partially
- Unit - mapping is loaded
TUF-77 UT, QA and Metrics for tables with basic logic
- Make Top100 step faster - started

2020-12-01

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- Bugfix: Person.race_concept_id and Co - is fixed
- Bugfix: Measurement.unit_concept_id - is fixed
- Bugfix: Measurement.value_as_concept_id - copied for mapping
- Bugfix: Visit_detail.services.visit_detail_concept_id and Co - copied for mapping
- Bugfix: Visit and Visit_detail FK violation - tables implementation is refactored
TUF-75 Basic Orchestration
- Bugfix: runs end-to-end if a given script is not found - fixed
  - bq_run_script.py now stops if any of given scripts are not found, and reports the names

2020-11-30

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- Minor update to Care_site
- Updates to Measurement (chartevents)
  - 4 types are added: Heart Rate, Respiratory Rate, O2 Saturation, Heart Rythm
- Fact_relationship is added
- Bugfix to visit_source_value (hadm_id -> concat(subject_id, '|', hadm_id))
  - Visit_occurrence, Visit_detail, and all event tables
TUF-55 Custom mapping derived from MIMIC III
- Microbiology mapping
  - Microtest is created
  - Organism is updated
- Admission mapping - 3 vocabularies added
- Place of service is added
TUF-75 Basic Orchestration
- global config (etlconf) is added to bq_run_scirpt.py and run_workflow.py
- Bug: run_workflow.py runs end-to-end if the given script is not found
TUF-77 UT, QA and Metrics for tables with basic logic
- metrics_gen is added (from mimic POC in May 2020)
- gen_bq_ut_basic.py is created, in progress
- src_statistics.sql is started (source row counts)
TUF-83 Complete run on the Full set
- full.etlconf is added, end-to-end run is started

2020-11-20

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- Mapping for Visits is updated
- Condition_era, Drug_era, Dose_era are added
TUF-55 Custom mapping derived from MIMIC III
- Microbiology mapping - 4 vocabularies are added, 1 to be created
- Analysis for other mapping
- Minor updates

2020-11-13

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Measurement - Rule 3 (microbiology) - done (without tests)
- TUF-58 Specimen - done (without tests)
TUF-75 Basic Orchestration
- run_job.py is renamed to run_workflow.py,
- Variables section is added to configs for bq_run_script.py and run_workflow.py
- Dataset names are replaced in ut/*.sql and qa/*.sql
- run_workflow.py is updated, the issue with not stopping on an error is fixed
- Nice output is added to bq_run_script

2020-11-12

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Measurement - Rule 3 (microbiology) ready to run

2020-11-10

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Measurement - Rule 3 (microbiology) is in progress
- TUF-58 Specimen - ready to run
- Other updates

2020-10-27

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-20 Death table - done
- TUF-19 Measurement - Rule 2 (chartevent) is started
TUF-76 Bugfix according to Achilles Heel Report
- Person count - done
- Condition mapping rate - done
- Observation periods - done
- For next updates:
  - Drug concepts in Condition
  - Observation value_as fields all null
  - Event dates before year of birth
TUF-75 Basic Orchestration
- Fixed: dealing with quotes (") inside a query
- Known issue: do not use semicolon (;) inside comments

2020-10-26

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Minor fixes for Measurement lookups
- TUF-51 Device_exposure - done with Rule 1 lk_drug_mapped
- TUF-58 Specimen - is started
TUF-75 Basic Orchestration
- run_job.sql is created
ATLAS dataset is populated: mimiciv_202010_cdm_531

2020-10-21

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Measurement - basic is done, Rule 1 (labevents) and Rule 10 (wf poc demo) are implemented
- TUF-53 Observation_period - basic is done
TUF-75 Basic Orchestration
- Config design in progress

2020-10-21

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-23 Drug_exposure - basic is done, Rule 1 is implemented
- TUF-55 Custom mapping - vocabulary references in the code are updated, CSV's folder is moved
TUF-50 Create dataset for ATLAS - the SQL script generator is created, script is generated

2020-10-18

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-23 Drug_exposure - in progress
- TUF-20 Death - in progress
- TUF-55 Custom mapping - CSV's for implemented tables are converted (todo: rename custom vocabularies in the code)

2020-10-12

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-18 Condition_occurrence - basic logic, UT, QA are created
- TUF-22 Procedure_occurrence - basic logic is created, UT and QA - to do
- TUF-21 Observation - from Procedure III is started, from Observation III is next
- TUF-23 Drug_exposure - started
- TUF-20 Death - in progress

2020-10-01

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-19 Measurement - Rule 1 (lab from labevents) is implemented, dry run is done
- MIMIC III extras/concept custom mapping CSVs are loaded to BQ to raw tables (for reference)

2020-09-24

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-17 Visit_detail - basic ETL, UT, QA are done (see Known issues in the ETL script)
- TUF-19 Measurement - start

2020-09-22

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-15 Visit_occurrence - basic ETL, UT, QA are done, proper to do
- TUF-16 Orchestration overhead - config design
- TUF-17 Visit_detail - basic ETL is in progress

2020-09-18

TUF-11 Load Vocabularies
- vocabulary_refresh.conf - updated
- sql scripts - project.dataset template is set
- py scripts - some error handling is added, minor bug is fixed
- Athena vocabularies are loaded to vocabulary_2020_09_11

2020-09-15

TUF-10 Basic logic for CDM tables based on MIMIC-III logic
- TUF-12 Location - Hardcode the single location
- TUF-13 Care_site - basic ETL, UT, QA are done, to add proper mapping and ID
- TUF-14 Person - basic ETL, UT, QA are done, to add proper mapping and ID

2020-09-11

Start development

The project includes:
- folders
- DDL script
- vocabulary_refresh scripts set

The End

Concepts / best practice / lessons learned

keep vocabularies in a separate dataset standard, and add custom mapping each time vocabs are copied to the cdm dataset?
- usual practice: apply custom mapping to vocabs in the vocab dataset
- plus of this alternative: no clean-up is needed, we just add the custom concepts
working on a task,
- create a branch / complete prev PR
- declare a comparison dataset, i.e. the result of previously completed task
- create a new working dataset named mimiciv_cdm_task_id_developer_yyyy_mm_dd
- create a UAT dataset when a significant piece of work is completed and tested
  - the name is mimiciv_uat_yyyy_mm_dd
  - uat = copy of the best and lates cdm
- cleanup cdm datasets when uat is created or earlier if they are not needed anymore

Concepts / Phylosophy

The ETL is based on the four steps.

Clean source data: filter out rows to be not used, format values, apply some business rules. Create intermediate tables with postfix "clean".
Map distinct source codes to concepts in vocabulary tables. Create intermediate tables with postfix "concept".
- Custom mapping is implemented in custom concepts generated in vocabulary tables beforehand.
Join cleaned data and mapped codes. Create intermediate tables with postfix "mapped".
Distribute mapped data by target cdm tables according to target_domain_id values.

Field unit_id is composed during the ETL steps. From right to left: source table name, initial target table name abbreviation, final target table name or abbreviation. For example: unit_id = 'drug.cond.diagnoses_icd' means that the rows in this unit_id belong to Drug_exposure table, inially were prepared for Condition_occurrence table, and its original is source table diagnoses_icd.

How do I get set up?

Summary of set up
Configuration
Dependencies
Database configuration
How to run tests
Deployment instructions

Contribution guidelines

Writing tests
Code review
Other guidelines

mimic's People

Contributors

Stargazers

Watchers

mimic's Issues

Mapping concepts in the inputevents table to RxNorm: Choosing the correct dose form for Non Iv Meds

A handful of medications in the table are listed under the category Non Iv Meds [ordercategorydescription] Non IV medications can have multiple dose forms (Oral Capsule, Oral Tablet, Oral Solution, Oral Suspension, and even Topical cream/lotion/gel) Unfortunately, there is not enough information to determine the correct dose form.

Clindamycin [label], [itemid] 225860, is Non Iv Meds [ordercategorydescription] and It can be mapped to multiple concepts (Oral Capsule, Oral Solution, Oral Suspension, and Oral Tablet) without knowing the correct dose form.
How do we correctly map non-iv medications to the correct dose form?

Mapping concepts in the inputevents table to RxNorm: Mapping the drug Carafate (Sucralfate)

While mapping concepts in the inputevents table we encountered an issue mapping the drug Sucralfate (Carafate):

The logic used to map to input events was that any drug under the category Drug Push [ordercategorydescription] is an IV medication. However, Carafate (Sucralfate) [label], [itemid] 225912, is listed multiple times as Drug Push [ordercategorydescription] even though this drug is in oral form only.
Is Sucralfate listed by mistake as a Drug Push? Or is the logic incorrect and Drug Push [ordercategorydescription] includes Non IV medications?

make 0.5 release

move out of dev branch

SQL import scripts (load .CSV to postgres)

shift in d_item IDs found

When continuing the mapping efforts, it was discovered that there has been a shift from MIMIC IV v. 0.4 to v. 1.0 causing incorrect mappings in comparison to the provided mapping tables.

We have to fix the mappings and upload them again as well as run the ETL process again with the new mappings.

mimic_mapping.csv

Any plan for philips eICU?

Hello everyone, thank you for your efforts.
By any chance, do you have plans to convert the ICU-CRD dataset (https://eicu-crd.mit.edu/)?
This dataset, published by the MIMIC team, is also widely used in medical informatics.

Mapping concepts in the inputevents table to RxNorm: Inconsistencies in [itemid]

We observed multiple inconsistencies in [itemid] in the inputevents table.

labels repeated with different [itemid]. Dexmedetomidine [label] has a category description of Continuous Med [ordercategorydescription] is listed twice with the [itemid] 229420 and twice with the [itemid] 225150
labels with different categories [ordercategorydescription] listed with the same [itemid]. Vancomycin [label] is listed as Drug Push and Non Iv Meds [ordercategorydescription] with the same [itemid] 225798. However, Epinephrine , [itemid] 221289, and Epinephrine. (With a dot at the end), [itemid] 229617, have different [itemid] because one is Continuous Med [ordercategorydescription] and the other is Drug Push [ordercategorydescription]

I can't find the file name in vocabulary_refresh.

Hello, I am a university student in Korea who is doing a process to convert MIMIC4 to OCOP CDM.
I downloaded whole files in this github. Now, the files are in
C:\Users\MyName\MIMIC-master\vocabulary_refresh
But, I can't find the file corresponding to vocabulary_2020_09_11. Have you ever changed this file name? I want to find local_athena_csv_path.

remove 0000000 from file names for demo data CSV file names

Need permissions to the waveforms source data bucket

When trying to run the script it complains that I do not have access rights to gs://mimic_iv_to_omop/waveforms/source_data/csv
Either you make this public or you somehow create a mechanism to allow us to gain access to it.

export script

at given milestone, executing an export script will produce a set of CSV files

Even better, we could use R. And export the tables into '.rds' format which keeps the data type (date, time, etc)

this is needed for physionet

Achilles extract

for physionet release, it would be nice to have achilles results as CSV files

copy of the 3 key tables for achilles

concepts explained

if we use some fixed OMOP concept, can we comment in ETL what the concept is

(to make ETL more readable)

https://github.com/OHDSI/MIMIC/blob/dev_ant_tuf_10_basic/etl/etl/cdm_visit_detail.sql#L258

Location needs to be set and delimiter needs an extra backslash for the bq script to work

Hi,
When trying to run the vocabulary refresh script I found out the following:

If you create your google cloud bigquery dataest in specific location (europe-west4 in my case) You need to set your location explicitly in the bq command line and your script sets it to US by default.
The \t delimiter is not passed correctly to bash and you need an extra \ in the vocabulary_refresh.conf to make it work.

Cheers
Tomer

value_as_concept_id in Measurements

Hello,

I have a question regarding the measurement table. I have noticed that some concepts in the measurement table have not been mapped in value_as_concept_id. IN my specific case I was looking at concept 40769406 - Specimen type, which only has a value in the value_source_value, but it is missing the mapping to the value_as_concept_id . Since the specimen type has only 4 values (CENTRAL VENOUS., VEN., ART., MIX.) I think the mapping would be fairly straight forward. Where would be possible to add this concepts so that the ETL can include the mapping to value_as_concept_id? I assume some custom_mapping vocabulary would need to be updated but I am not sure which one.

Thanks in advance

Google Cloud Configuration Instructions

Is there a Readme for configuring the google cloud projects and datasets to run the ETL? I am getting the following error:

"BigQuery error in query operation: Error processing job 'mimic-omop:bqjob_r433fa0779e4406df_000001776a2fcc2e_1': Not found: Table mimic-omop:mimic_full_cdm.src_transfers was not found in
location US".

I am not sure if this is relating to a configuration issue on my end or if the script was supposed to create the src_transfers table first.

Any help would be appreciated. Thank you for your time.

typo deatil

is this typo also in the OHDSI CDM spec?

https://github.com/OHDSI/MIMIC/blob/dev_ant_tuf_10_basic/etl/etl/cdm_visit_detail.sql#L225

d_micro table no longer available

d_micro table was removed in v0.4 of mimic iv but it's still being used in the etl scripts

Question: Relation between anchor age and CDM year of birth

Thanks for making all this code public! I was looking through the ETL flow and there is one thing I do not quite understand: In the PhysioNet documentation of the MIMIC IV data v2.2, it states that "the anchor_age provides the patient age in the given anchor_year". Hence, it would seem the patient's year of birth (in shifted MIMIC time) should be anchor_year - anchor_age. However, looking at this line of the conversion code (from etl/etl
/cdm_person.sql), it seems the CDM year of birth is set directly to the anchor year (the anchor age is only used in etl/lk_procedure.sql in a different context, as far as I can tell). Did I misunderstand smt. or miss a part of the puzzle?

use Achilles for unit testing

not just sql like this
https://github.com/OHDSI/MIMIC/blob/dev_ant_tuf_10_basic/test/metrics/mt_measurement.sql

I have a question about wf_details in the waveform collecting stage

Hello. I'm Korean student and I'm trying to convert MIMIC-IV to OMOP CDM.
I've already downloaded waveform data and storaged in local and google bigquery.
Also, I've already seen document about waveform data stored in waveforms directory.
My question is below.

If directory is p100/p10014354/81739927/81739927n.csv, I understand like this.
"case id = p100, subject id = p10014354, short_reference_id = 81739927, long_reference_id = google cloud storage address"
Is my understanding correct?
I'm having challenge getting wf_details table. First, I got wf_header table and schema is (case_id, subject_id, short_reference_id, long_reference_id). How could i get wf_details table?

Review of inputevents medication mappings

review Abdulrahman's mappings

GS

I'm having a problem with the gs (I suppose google storage). Is it really necessary to run the code, since I'm doing it in my local machine.

If it is not necessary, how can I remove it from the code. If it is necessary how can I have access to it?

I have this when I try to run -s10 of the vocabulary refresh:
gsutil rm gs://mimic_iv_to_omop/vocab_2020_09_11/.csv
sh: 1: gsutil: not found
return_code 32512
gsutil cp /storage/store3/work/sbrasild/vocabulary/.csv gs://mimic_iv_to_omop/vocab_2020_09_11/
sh: 1: gsutil: not found
return_code 32512

Inquiries about full.etlconf

Hi, when I try to run python scripts/run_workflow.py -e conf/full.etlconf -c conf/workflow_ddl.conf, I get this error:

Am I supposed to change all the variables in conf file? If so, where do I find the appropriate datasets? Thank you for your help.

Mapping concepts in the inputevents table to RxNorm: Mapping Medications listed with dosage

We found multiple medications listed with dosage while mapping concepts in the inputevents table.

Phenylephrine (200/250) [Label], [itemid] 229632, most probably means 200 mcg/ml or 50 mg of phenylephrine in 250 ml of 0.9% Sodium Chloride (Normal Saline). There is not a suitable match for this medication.
What is the correct way of mapping such medications?

How to find tmp_concept data file?

I have an error in vocabulary_refresh.py step 12~13

I can't find any tmp_concept data file anywhere.
Please help for me.

Also, I have extra question. I had a json file in bigquery for schema.
I set gs_file_path ext for .json and saved in bigquery. But, I had an error like this.

How can I solve the problem?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

ohdsi / mimic Goto Github PK

mimic's Introduction

MIMIC IV to OMOP CDM Conversion

Concepts / Phylosophy

How to run the conversion

Change Log (latest first)

The End

Concepts / best practice / lessons learned

Concepts / Phylosophy

How do I get set up?

Contribution guidelines

mimic's People

Contributors

Stargazers

Watchers

Forkers

mimic's Issues

Recommend Projects

Recommend Topics

Recommend Org