What Fixing mislabels in frontal/lateral information for b

Discussion: Frontal/Lateral images and mislabel about cxrail-dev HOT 2 OPEN

kdg1993 commented on August 16, 2024 2

Discussion: Frontal/Lateral images and mislabel

from cxrail-dev.

Comments (2)

seoulsky-field commented on August 16, 2024 1

Wow, what a sincere content! I can find your deep consideration in this issue.

Anyway, let's talk about more and more.
First, in my opinion, I think the most reason we cannot overcome the validation AUROC score of 0.9(or more) is "-1". But it's just the prospective of "most", I agree that the mislabeling frontal/lateral is the thing we have to check.

Second, I guess, test set might not have mislabeling because both validation and test were checked by radiologits in CheXpert dataset, and the other datasets could have mislabeling. Of course, more accurately, it's the best way for us to check validation and test images.

So, I suggest a new method.
First, we double-check CheXpert valid, test dataset. Because of their size and their reliability, it's will be easy. (If we think we need to do more datasets, we can get train image!)
Second, revise mislabeling things if they exist and train binary classification model. (Frontal vs Lateral) I think the difference between them is clear not such as classification 14 labels in CheXpert.
Third, check the accuracy score and check errors.
Fourth, check MIMIC-CXR's valid or test or BRAX its. (Like a way that I said in "First".)
Fifth, inference model from "Third" to "Fourth" datasets.
Sixth, result check.
Seventh, use other datasets and train/valid/test sets and double check the result.

If we use this method, it can be need more times rather than other ways, however, we can give users to automate cleansing mislabels. (We can give pth and notebooks!)

from cxrail-dev.

seoulsky-field commented on August 16, 2024

Today, I checked test_labels.csv in the CheXpert test dataset from CheXlocalize dataset which was downloaded by Azure Storage Explorer. It doesn't have a 'Frontal/Lateral' column unlike train.csv and valid.csv, so I got view position values from the file names. (All of split csv files have view position value in 'Path' column in csv.) And after that, I checked the matches correctly between view position values and images one by one. Fortunately, the mislabeling values did not exist.

So, from this methodology, I checked train.csv and valid.csv in CheXpert dataset. In this situation, I couldn't check the matches between the view position values from 'Path' column and images, also it's not necessary because both train.csv and valid.csv have 'Frontal/Lateral' column in each csv file. Namely, I should compare the view position values between from the file names and 'Frontal/Lateral' columns in csv file.

These two images are the result images. The top one represents the result of train.csv and the bottom one represents the result of valid.csv. As you can see, fortunately, there are no mislabeling values in CheXpert datasets! (Of course, it's more better to double-check one by one between images and column values. However, it's not easy work for us that we have not enough time.)

And thanks for your great discussions! Also, we must check in MIMIC dataset same!

from cxrail-dev.

Discussion: Frontal/Lateral images and mislabel about cxrail-dev HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent