reginabarzilaygroup / sybil Goto Github PK

View Code? Open in Web Editor NEW

59.0 59.0 36.0 1.56 MB

Deep Learning for Lung Cancer Risk Prediction using LDCT

License: MIT License

Python 99.72% Shell 0.28%

sybil's People

Stargazers

Watchers

sybil's Issues

SCORE meaning

I am a beginner. I got the Prediction scores and I want to know what each score represents
Prediction(scores=[[0.0005993253234549188, 0.0032729617776346745, 0.005461805129414722, 0.010187651898493539, 0.01629524016091344, 0.027880663449820757], [0.002614143679095454, 0.006778430365650433, 0.01608498481612357, 0.02285970665741425, 0.027484834813507393, 0.0459973147267193]])
Thanks!

Metada format for dataloader

Hello,

I intend to use json file for loading my data. In the loading part there is a lack of information for sample_metadata. I looked through nlst.py folder under loading folder, but if you can share the sample_metadata.json file with some random numbers and format I would appreciate it.

Best Regards

question about the bounding box annotations

Hi,

My team and I at BWH (@fedorov) are interested in using your expert bounding box annotations, and are currently trying to display them in 3DSlicer.

Here I've chosen a particular annotation to display:

"1.2.840.113654.2.55.56880820416622279507487194633461962174": {
	"1.2.840.113654.2.55.133415818681717538807299664091449063309": [
		{
			"x": 0.43682755859375,
			"y": 0.2967157421875,
			"width": 0.30124578125,
			"height": 0.20271802734375
		}
	],

After converting the top left, bottom right, and center coordinates to mm space, we have the following in Slicer:

However, this bounding box almost looks too large for the nodule in question.

We wanted to make sure that the above annotation annotation is correct, and that the bounding box corresponds to the original DICOM data. Did the experts use any guidelines to define the bounding boxes around the nodules? (for instance a set margin of 10 voxels around the nodule).

Thank you!

Deepa

Project dependencies

To make it easier to install Sybil, the dependencies should be added to the setup.cfg file so that they automatically install if absent.

ValueError: high is out of bounds for int32

Error message

ValueError: high is out of bounds for int32

Solution

modify 132-th line of sybil/serie.py as follows:
sample = {"seed": np.random.randint(0, 2**32 - 1)}
to
sample = {"seed": np.random.randint(0, 2**32 - 1, dtype=np.int64)}

Get coordinates or visualize the most activated voxels from Sybil’s attention scores

This is a great tool to estimate lung cancer risk. In the appendix to the paper describing Sybil, it is mentioned that the authors measured Sybil’s ability to localize a cancerous nodule on an LDCT. To do it, the most activated voxel from Sybil’s attention scores were selected. Would it be possible to indicate which section of the code lists which voxels were the most activated in the input image or was an external tool used?

Nifti support

Hi,
Thank you for publishing this tool. I have code extending support for nifti files at https://github.com/tom1193/Sybil/tree/nifti-wrapper. I tested on the Ardila test set using dcm2niix to generate nifti files. Risk scores on the scan level are quite different (<0.1) b/w the two formats. Differences in aggregate AUC on the test set are not as bad (<0.01). I'm happy to submit a PR if you would like to add this feature and think it's "close enough". Also open to suggestions that could account for these differences in risk scores.

Path not found for calibrator model when using "sybil_base"

Nice project. "sybil_ensemble" works fine, however, there seems to be a problem with the localization of the calibrator to "sybil_base".

I am also a little bit confused by the calibrator concept. I noticed your comment about sklearn.calibration.CalibratedClassifierCV, but i was wondering howto create a calibrator when training sybil with my own data. I would be happy if you could explain the role and creation of a calibrator in more detail.

I tried to use your preprocessing methods and pretrained model but it didn't work. May I check with you?

Hello,
I tried to use your preprocessing methods and pretrained model but it didn't work on my dataset. May I check with you about 3 questions?

(1) I used DCMTK with: dcmj2pnm +on2 --min-max-window --set-window -600 1500 pathToDCM pathToPNG16
Did I use DCMTK with the same command line as yours?

(2) I studied augmentations.py and followed its methods to change group of PNG16 into tensor
([mean std]=[128.1722, 87.1849] for normalization), (TorchIO to do interpolation, but I changed the voxel to 1.5 * 1.5 * 1.5 mm), (The tensor [min, max] was about [-1.4701, 1.4547]) (The tensor input into the model had shape (C T H W))

The subplots in this image are some slices from the interpolated tensor (plot by plt.imshow(1SliceOfTensor, cmap='gray'))

Did I changed the PNG16 into tensor in the correct way?
Do you think it's a bad idea to change its voxel to 1.5 * 1.5 * 1.5 mm (you use 1.4 * 1.4 * 2.5 mm)?

(3) I load your pretrained model's encoder back into standard r3d_18 and replace its last fc layer so that it can train on my 5 classes dataset.

resnet3d = torchvision.models.video.r3d_18(pretrained=True)
path = "/path/to/65fd1f04cb4c5847d86a9ed8ba31ac1aepoch=10.ckpt"
checkpoint = torch.load(path, map_location="cpu")
(the layers' names in your pretrained model are different from the standard r3d_18 so that I change them back)
state_dict = {("layer"+k[20:]): v for k, v in checkpoint["state_dict"].items()}
state_dict["stem.0.weight"] = state_dict.pop("layer0.0.weight")
state_dict["stem.1.weight"] = state_dict.pop("layer0.1.weight")
state_dict["stem.1.bias"] = state_dict.pop("layer0.1.bias")
state_dict["stem.1.running_mean"] = state_dict.pop("layer0.1.running_mean")
state_dict["stem.1.running_var"] = state_dict.pop("layer0.1.running_var")
state_dict["stem.1.num_batches_tracked"] = state_dict.pop("layer0.1.num_batches_tracked")
model_dict_copy = resnet3d.state_dict()
pretrained_dict = {k: v for k, v in state_dict.items() if k in model_dict_copy}
model_dict_copy.update(pretrained_dict)
resnet3d.load_state_dict(model_dict_copy)

Your pretrained model has other layers after the encoder but I am not sure whether I should use them.
Do you think I use your pretrained model correctly?

I freezed the encoder and trained only the fc layer because my dataset is small. But the training accuracy stay very low (40% for 5 classes classification).
Do you think I made any mistakes?
Thank you very much.

Download models from GitHub releases and not Google Drive

Drive downloads started failing recently, would be nice to store models in the same place as code anyway.

AttributeError: 'CalibratedClassifier' object has no attribute 'estimator'

Hi, I am facing error while i am trying to run
from sybil import Serie, Sybil

# Load a trained model
model = Sybil("sybil_base")

# Get risk scores
serie = Serie([dicom_path_1, dicom_path_2, ...])
scores = model.predict([serie])

# You can also evaluate by providing labels
serie = Serie([dicom_path_1, dicom_path_2, ...], label=1)
results = model.evaluate([serie])

Then I tried to install the packages from requirement.txt but it is not allowing me to install the version of the Scikit-learn.
I was wondering if you have any solution that would be helpful.
Thanks in advance.

Same scores for two different images

Hello,

We are running into an issue with running Sybil. We are using the given code on the readme and trying to get scores for some test dicoms that you have provided in your data folder (Sybil demo_data). The exact dicoms are:

1-2da413541bb2518fb0f8c583900999ef
194-89970e1e7ba1759f86babb310a2c04e9

For some reason, we get the same score for these two different dicoms. We tested to see if this was the case with both Sybil_1 and Sybil_2. Sybil_1 and Sybil_2 provide different numbers. But Sybil_1 provides the same scores for both dicom 1 and dicom 2. Same thing with Sybil_2.

Specifically, the score for Sybil_1 are:
[0.005670641576314818, 0.016728911619303625, 0.040977454787905605, 0.05335478429725567, 0.06768990118864318, 0.10217879263786658]

Scores for Sybil_2 for both dicoms are:
[0.007401745617755098, 0.01943424256123729, 0.0336564680065982, 0.046328010497170294, 0.057618836662999294, 0.08531938437897854]

We looked to see if the issue persisted with an external dcm dataset found here: https://www.kaggle.com/datasets/ymirsky/medical-deepfakes-lung-cancer?resource=download&select=labels_exp1.csv

Same issue. We used from CT_Scans/EXP1_blind/1003/0.dcm and CT_Scans/EXP1_blind/1546/159.dcm. The scores we got for both of these from Sybil_1:

[0.011891088336936041, 0.025743208030524028, 0.05334339990528849, 0.05963512647876064, 0.07540808448822184, 0.10650834286905617]

Here is the code that we are using:

from sybil import Serie, Sybil
model = Sybil("sybil_2")
serie = Serie(['Test.dcm'])
scores = model.predict([serie])
print(scores)

We have also attached the two dicom images (within zip file) from the Sybil demo data we used.
test_dicoms.zip

Please assist us, thank you!

(preprocessing) When the data volumes were resampled to 0.70.72.5 mm3 voxels, is there a standard width and height of pixels for all the volumes?

Hello. I am studying your paper and your code and I have a question about your preprocessing.

In your paper's Supplementary Methods, it mentioned that the data volumes were resampled to 0.7 * 0.7 * 2.5 mm3 voxels and then the width and height were resized to 256*256 pixels.
When the data volumes were resampled to 0.7 * 0.7 * 2.5 mm3 voxels, is there a standard width and height of pixels for all the volumes? (the volumes' depth is 200, as mentioned in the paper)

(If different data volumes have different width and height of pixels after they are resampled to 0.7 * 0.7 * 2.5 mm3 voxels, the resampling doesn't make sense when the data volumes are resized to 256*256 pixels because the voxels of different data volumes will have different sizes after the resizing.)

Thank you very much.

Streamline individual inference

Right now, inference on a single exam requires the user to write their own python script. There should be a simple command-line utility for running inference. This will help anyone who does not wish to use docker/ark for whatever reason.

Attribution for NLST dataset

Hi, it's great to see you all leveraging the National Lung Screening Trial dataset. Can you please add the dataset citation to your README.md in order to adhere to TCIA's Data Usage and Citation Policy?

National Lung Screening Trial Research Team. (2013). Data from the National Lung Screening Trial (NLST) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.HMQ8-J677

Also, if you'd like to increase awareness of your data and code you might consider submitting a proposal to publish the annotations as described in https://www.cancerimagingarchive.net/analysis-results/ or adding your data to https://zenodo.org/communities/nci-idc/about (which would also result in us linking to your dataset on the NLST page).

Best,
Justin

Dataset split

Hi,
firstly, I want to express my appreciation for the valuable tool you've shared with the community!
In the Data Sharing Statement of your work it says that the data splits, along with expert radiologist annotations, trained models, and code are available in this repo. However, I cannot find the data split, and it seems that the files uploaded here are empty.
Could you please provide the missing data split?
Thank you

ink still wet, code already broken

probably taken out by changes in scikit-learn
too bad there isn't a requirements.txt or a Dockerfile with a hint
of what version of scikit-learn was originally used...

scores = model.predict( [ serie] )
Traceback (most recent call last):
File "", line 1, in
File "/home/mlewis/anaconda3/envs/sybil/lib/python3.9/site-packages/sybil/model.py", line 289, in predict
calib_scores = self._calibrate(scores).tolist()
File "/home/mlewis/anaconda3/envs/sybil/lib/python3.9/site-packages/sybil/model.py", line 222, in _calibrate
probs = self.calibrator["Year{}".format(YEAR + 1)].predict_proba(probs)[
File "/home/mlewis/anaconda3/envs/sybil/lib/python3.9/site-packages/sklearn/calibration.py", line 500, in predict_proba
proba = calibrated_classifier.predict_proba(X)
File "/home/mlewis/anaconda3/envs/sybil/lib/python3.9/site-packages/sklearn/calibration.py", line 791, in predict_proba
pred_method, method_name = _get_prediction_method(self.estimator)
AttributeError: '_CalibratedClassifier' object has no attribute 'estimator'

reginabarzilaygroup / sybil Goto Github PK

sybil's People

Stargazers

Watchers

Forkers

sybil's Issues

Error message

Solution

Recommend Projects

Recommend Topics

Recommend Org