nyuad-cai / medfuse Goto Github PK

View Code? Open in Web Editor NEW

58.0 58.0 16.0 17.62 MB

Python 96.19% Shell 3.81%

medfuse's People

Contributors

Stargazers

Watchers

Forkers

ghadeer-o wuer7754 ellen0wangcx aniruddhraghu wh-forker yamingy llllljm jlaitue isaacmg lwxy11 datajacker ashfaquekhowaja bethany-gosala carlosbelop hackbuteer001

medfuse's Issues

modify the code in create_split.py to sovle the problem '0 sample in val and test dataset of CXR_UNI'

First of all thank you for your work, it's very useful!

However I found some problems when using it, in create_split.py the elements in the lists val_subject_ids and test_subject_ids are formatted as strings, not int types. This leads to a problem in

cxr_splits.loc[cxr_splits.subject_id.isin(val_subject_ids_int), 'split'] = 'validate'
cxr_splits.loc[cxr_splits.subject_id.isin(test_subject_ids_int), 'split'] = 'test'

The corresponding val and test tags could not be matched in these two steps. This causes the val and test sections of the CXR_UNI dataset to be empty.

I added

val_subject_ids_int = [int(i) for i in val_subject_ids]
test_subject_ids_int = [int(i) for i in test_subject_ids]

after val_subject_ids and test_subject_ids, which solves the problem and we can then get the CXR dataset for pretraining. Thanks！

about the create_split.py file

hi, thanks for your interesting work : )
In the README, you mentioned the create_split.py file related to preprocessing. But I can't find this file. Could you please provide this file. Thanks a lot!

unable to download

Due to low system configuration, unfortunately, the download fails intermittently, which I tried many times.
I understand your concern. Recently, I have been certified with MIT, CITI-data use Agreement through physionet. Please share the processed data, which will be helpful in proceeding forward with my research work.

Need of the Processed data from MedFuse.

Hi @farahshamout ,

I tried to extract the data which you mentioned in the ReadMe file of Github. while extracting, my system gets struck up. During our discussion through email, you mentioned that you had preprocessed data from MIMIC-CXR and MIMIC-IV, Can you please share it for my work?

Pretrained Model

Hi, are you planning to release pre-trained model along with sample dataset for inference and validation purposes?
Thanks...

'None' data issue when executing sh ./scripts/radiology/uni_cxr.sh

when executing sh ./scripts/radiology/uni_cxr.sh, error occurs.
Below are the details:
File "/mnt/gsai/brain/MedFuse/datasets/fusion.py", line 202, in
x = [item[0] for item in batch]
TypeError: 'NoneType' object is not subscriptable

In datasets/fusion.py, there is a def my_colate() ,used for dataloader.
I debugged with following code"

`
def my_collate(batch):
# for loop for debugging

for item in batch:

    if item is not None:

        ehr_data, cxr_data, labels_ehr, labels_cxr = item

        print(ehr_data.shape, cxr_data.shape, labels_ehr.shape, labels_cxr.shape)

    else:

        print(item)`

It prints 'None' for every item in batch.
Can you figure it out what's the problem?

Duplicate samples in dataset ‘partial_ehr_cxr’

The code

index = random.randint(0, len(self.ehr_files_unpaired)-1)

in datasets/fusion.py produces duplicate samples in dataset ‘partial_ehr_cxr’ about 20% (depending on the random seed). If you want to get the dataset without duplicate samples, considering use

index = index - len(self.ehr_files_paired)

Thanks.

Request to get test_listfile.csv and val_listfile.csv

hi, I am doing project with your datasets. I cannot able to get the mimic_iv_extracted folder , because i need test_listfile.csv and val_listfile.csv. Can you please lead me how to get those files for my work?

About the ICD mapping in the phenotype classification task

Thank you for your work first, which has been a great inspiration to me.
I have noticed that "we mapped all ICD-10 codes to ICD-9 using the guidelines provided by the Centers for Medicare & Medicaid Services1, and then map them to CCS categories" was mentioned in section 4.1 Datasets and Benchmark Tasks. But I did not find the relevant content in the code. May I ask how it is implemented here?

pre-trained model

Could you please share your pretrained model for evaluation? thanks

Need to know about both storage space and execution of the code for MedFuse

Hi @farahshamout ,
May i know how much memory is needed for the dataset storage + for the execution of the program (to Run the code)? So that I can request my institution for the same. {MedFuse}

Can you share your Dataset?

I want to get MIMIC-Clinical time series data and val_listfile.csv and test_listfile.csv.

>Processed Data

Originally posted by @AmudhaTK in #7 (comment)

Needs clarity in runing the shell script

Hi @farahshamout,
I can understand the situation of not sharing the original protected datasets. But with your help I come to know how to get preprocessed data from raw datasets. After getting proper system up-gradation, I started downloading the raw data from MIMIC IV and MIMIC-CXR-JPG. Thank you for the proper guidance on the memory needed to download the data.

Now, I am working on the processing of the raw data. I did MIMIC IV data extraction using MIMIC III. I need clarification on this part.

Kindly help me with the details of the output that we can get after the execution of this Shell script:

train the imaging model with 14 radiology labels.

sh ./scripts/radiology/uni_cxr.sh

train LSTM model on extracted time-series EHR data for phenotype task.

sh ./scripts/phenotyping/train/uni_all.sh

train LSTM model for the in-hospital-mortality task

sh ./scripts/mortality/train/uni_all.sh

It is showing error,as needs two positional arguments.
Which arguments I should pass to run the shell script?
It would be helpful to know the proper output of the above command.

thank you,
Amudha T K

needs help in running shell script

needs help in running the shell scripts

Dear @farahshamout ,
Hope you are doing well.

I need help in running the shell scripts.
sh ./scripts/phenotype/uni_csr.sh