Code Monkey home page Code Monkey logo

Comments (2)

seoulsky-field avatar seoulsky-field commented on August 16, 2024 1

Wow, what a sincere content! I can find your deep consideration in this issue.

Anyway, let's talk about more and more.
First, in my opinion, I think the most reason we cannot overcome the validation AUROC score of 0.9(or more) is "-1". But it's just the prospective of "most", I agree that the mislabeling frontal/lateral is the thing we have to check.

Second, I guess, test set might not have mislabeling because both validation and test were checked by radiologits in CheXpert dataset, and the other datasets could have mislabeling. Of course, more accurately, it's the best way for us to check validation and test images.

So, I suggest a new method.
First, we double-check CheXpert valid, test dataset. Because of their size and their reliability, it's will be easy. (If we think we need to do more datasets, we can get train image!)
Second, revise mislabeling things if they exist and train binary classification model. (Frontal vs Lateral) I think the difference between them is clear not such as classification 14 labels in CheXpert.
Third, check the accuracy score and check errors.
Fourth, check MIMIC-CXR's valid or test or BRAX its. (Like a way that I said in "First".)
Fifth, inference model from "Third" to "Fourth" datasets.
Sixth, result check.
Seventh, use other datasets and train/valid/test sets and double check the result.

If we use this method, it can be need more times rather than other ways, however, we can give users to automate cleansing mislabels. (We can give pth and notebooks!)

from cxrail-dev.

seoulsky-field avatar seoulsky-field commented on August 16, 2024

Today, I checked test_labels.csv in the CheXpert test dataset from CheXlocalize dataset which was downloaded by Azure Storage Explorer. It doesn't have a 'Frontal/Lateral' column unlike train.csv and valid.csv, so I got view position values from the file names. (All of split csv files have view position value in 'Path' column in csv.) And after that, I checked the matches correctly between view position values and images one by one. Fortunately, the mislabeling values did not exist.

So, from this methodology, I checked train.csv and valid.csv in CheXpert dataset. In this situation, I couldn't check the matches between the view position values from 'Path' column and images, also it's not necessary because both train.csv and valid.csv have 'Frontal/Lateral' column in each csv file. Namely, I should compare the view position values between from the file names and 'Frontal/Lateral' columns in csv file.

These two images are the result images. The top one represents the result of train.csv and the bottom one represents the result of valid.csv. As you can see, fortunately, there are no mislabeling values in CheXpert datasets! (Of course, it's more better to double-check one by one between images and column values. However, it's not easy work for us that we have not enough time.)

스크린샷 2023-02-09 오후 3 12 23

스크린샷 2023-02-09 오후 3 12 07

And thanks for your great discussions! Also, we must check in MIMIC dataset same!

from cxrail-dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.