Code Monkey home page Code Monkey logo

chexzero's People

Contributors

ekkin2 avatar rajpurkar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

chexzero's Issues

about h5 file key

Hello, thank you for sharing the code.
There seems to be one problem loading the h5 file into the CXRdataset class.
Wouldn't the key value be cxr rather than cxr_unprocessed in L39 (also, L42)?

CheXzero/train.py

Lines 24 to 43 in c303e5c

class CXRDataset(data.Dataset):
"""Represents an abstract HDF5 dataset.
Input params:
file_path: Path to the folder containing the dataset (one or multiple HDF5 files).
recursive: If True, searches for h5 files in subdirectories.
load_data: If True, loads all the data immediately into RAM. Use this if
the dataset is fits into memory. Otherwise, leave this at false and
the data will load lazily.
data_cache_size: Number of HDF5 files that can be cached in the cache (default=3).
transform: PyTorch transform to apply to every data instance (default=None).
"""
def __init__(self, img_path, txt_path, column='report', size=None, transform=None):
super().__init__()
if size != None:
self.img_dset = h5py.File(img_path, 'r')['cxr_unprocessed'][:size]
self.txt_dset = pd.read_csv(txt_path)[column][:size]
else:
self.img_dset = h5py.File(img_path, 'r')['cxr_unprocessed']
self.txt_dset = pd.read_csv(txt_path)[column]

Train Loss seems stuck after a point

Hi, Thanks for the code you've provided.

I was trying to run the train on default specs of the repository. However I observed that the training loss gets suck at a value after a few minibatch cycles for both pretrained and non-pretrained cases. I tried this on the complete MIMIC dataset as well as a smaller batch I created for testing, but the loss was getting stuck at a value in all cases rather than moving towards overfitting.

Is this expected behaviour? Has anyone else also come across this?

Training not converging with default settings

Hi,
we're from the University of Wuerzburg and tried to replicate your project for German report data.
For now, we simply tried to get your code to run and train on MIMIC with the default settings provided as well as the settings provided in your paper. Of course, we made sure to have the same package versions as in the project.

However, we quickly get NaN loss after some iterations. So first, we tried to create a subsample of the dataset. For a very small dataset (~300 images), the training does converge. However, even for 1000 images the loss does not get smaller. We also tried several different learning rates and hyperparameters, but nothing helped so far.

I was hoping that you might be familiar with our problems and give us advice here.

Thanks in advance!

How to interpret results? (weird predictions)

Tested on some pneumonia image, getting these predictions:

[array([[0.49470618, 0.49632818, 0.501808  , 0.49816415, 0.49443188,
              0.4973687 , 0.5035202 , 0.50338894, 0.49944744, 0.4980956 ,
              0.5006629 , 0.5109319 , 0.5024933 , 0.48189908],
             [0.48987147, 0.4886013 , 0.505182  , 0.50687265, 0.49118397,
              0.50182277, 0.5023532 , 0.50404346, 0.49911276, 0.49427748,
              0.49890172, 0.51351357, 0.5014591 , 0.48301208]], dtype=float32)]

What does it mean? I think predictions should give 1.0 in total, here it seems to suspect every pathology with 50% probability, how so?
The same happens for both best_64_5e-05_original_22000_0.864.pt and best_128_5e-05_original_22000_0.855.pt weights.

TypeError: must be real number, not str

Getting the error in title when training.

Full command: python3 run_train.py --cxr_filepath {path to cxr.h5} --txt_filepath {path to mimic_impressions.csv}

Full traceback:

Traceback (most recent call last):
  File "/home/ec2-user/MedZero/CheXzero/run_train.py", line 143, in <module>
    model = model_pipeline(args)
  File "/home/ec2-user/MedZero/CheXzero/run_train.py", line 44, in model_pipeline
    train(model, data_loader, device, criterion, optimizer, config)
  File "/home/ec2-user/MedZero/CheXzero/run_train.py", line 85, in train
    for data in tqdm(loader):
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/opt/conda/envs/medzero/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 68, in default_collate
    return torch.tensor(batch, dtype=torch.float64)
TypeError: must be real number, not str

zeroshot result

When I ran the zero shot.ipynb file, I'm not sure why the average AUC of "No Finding" from the result is only 0.0700.

ss

ground_truth format

ground_truth_3.csv
I was able to train a model with a small dataset and wanted to test the model with the original JPGs and giving a grount_truth.csv file encoded as 0s for all labels except "No Finding". I am getting back the prediction averages, but bootstrap_results[1] returns a table full of NaNs. What could be the issue?
Attached is a screenshot of the issue.
Screenshot_20231207_150757
Hope you can help us.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.