seoulsky-field / cxrail-dev Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 974.86 MB

CXRAIL-dev

License: MIT License

Python 0.16% Jupyter Notebook 99.84%

cxrail-dev's Introduction

Publications

Yisak Kim, Kyungmin Jeon, Soyeon Kim, Chang Min Park, 2023, Lesion in-and-out painting for medical image augmentation,
In Deep Generative Models for Health (DGM4H) Workshop at NeurIPS 2023

cxrail-dev's People

Contributors

Stargazers

Watchers

cxrail-dev's Issues

Features: Make Random Augment & Asymmetric loss tunable

What

Make Random Augment & Asymmetric loss tunable

Why

To achieve a higher score, we need to tune sensitive hyperparameters.
There are far more things that can be tuned but due to the restriction of time and resources,
it would be better to tune efficient hyperparameters (which have enormous potential and cover various situations).

According to the papers (Random Augment & Asymmetric loss), there are sensitive hyperparameters.
In other words, there are hyperparameters that have potential.
(E.g. augmentation strength is sensitive to data size and model complexity)
Also, loss function and augmentation are always needed to train and data or model.

In conclusion, tuning augmentation and loss satisfy potential and coverage.

How

Make random augment tuneable
MAke asymmetric loss tunable
Simple test (small subset of data) with wandb

Features: Replace current train metric logger

What

Replace current train metric tracker to current inference metric tracker.

Why

We use different metric tracker to do same works between trainer and inference. For providing unity, I think it's really important and also can provide less confusion.

How

Use inference metric tracker to trainer.

Features: Add a model soup function

What

Make our codes support a model soup method.

Why

I think a model soup method is one of the best generalization method. I think if we support model soup method, we can provide users more reliability and availability in their experiments.

How

Can reference two papers which one is Model Soups: averaging weights of multiple fine-tuned models
improves accuracy without increasing inference time and the other is Model Soups improve performance of dermoscopic skin cancer classifiers.

Add log analysis tool

Summary: Implement a tool to collect and summarize log files assuming that the user is running an experiment by hydra multi-run

To Do

Handle sub-experiments (by hydra multi-run) using dictionary type
Class type
Provide summary DataFrame
Show the best score and its tuned hyperparameters

Features: CLI logging using rich

What

We discussed about this problem in #19.
Change the current CLI logging using "rich" library.
Current: just use print() function and print validation score, loss, epoch, batch_id at specific batch number. And, no progress bar.

Why

When we observe the CLI results reporting during the training, we have inconvenient things such as "when is the training finished?", "I want to see the CLI logging which is more simple!".
So, based on each of experiences, we decided to change CLI results reporting!

It's really important to provide correct, convenient, easy to see and analyze CLI logging.
And I think "rich" library can meet the conditions!

How

I think we can reference rich official documents and examples.
Also, rich is used in lightning-hydra-template, so it can be referenced too.
I plan to use both "progress bar" and "status" properly.

Apply progress in single gpu, without RayTune
Support progress in single gpu, with RayTune
Support progress in multi gpu, with RayTune
Modularize with class (ex. RichProgressBar)

Features: EDA for CheXpert data

What

EDA for the CheXpert data
Especially the target label distribution and pathological aspect

Why

Inspired by the creative suggestion from @chrstnkgn and the good motivation of @seoulsky-field from #10, I made up my mind to further explore the CheXpert data set
Also, the excellent replies from @jieonh about the rank 2 CheXpert leaderboard & from @chrstnkgn about the CheXpert datasheet, make me more curious about the labeling system & feel complicated about the similarity of train and validation distributions

How

Explore the target label (especially the uncertain label)
Analyze target class converting
Pathological Hierarchy
Image aspect (future plan)
(New task) Add test set analysis

Features: Support more models, losses, optimizers

What

Support more models, losses, optimizers

Why

While we discussed about experiments, we decided to use three models: ResNet50, DenseNet121, Swin Transformer.
There is not difficult works to support models, loss,es optimizers I think.

How

Support Swin Transformer.
Implement and support focal loss.
Support RMSprop optimizer.
Support AdamW optimizer.

Features: Change the execution method and config structure of ray

What

Instead of executing ray by changing the mode config, put it in the hparams_search config and override it.
(Refer to the structure of the hydra-lighting template)

Why

I think this will simplify not only the config structure but also the parameter selection code in the trainval function of train.py
This structure will make it possible to apply hyper parameter tuning tools other than ray (ex. optuna, wandb sweep, etc.)

How

Remove mode config and create hparams_search config
Modify parameter selection structure on train.py and main.py
Modify ray.yaml, default.yaml config to fit the structure
Apply other hyperparameter tuning tools (For future. Not a priority)

Add hyperparameter choice structure

Summray: Ideas for structural design that gives users some freedom to set hyperparameters to a fixed value (by hydra config) or to use tuning (by Ray Tune)

Features: Add CheXpert train csv made by CheXbert

What

Add train.csv which made by CheXbert.

Why

We have a CheXpert csv by CheXpert labeler in CheXpert dataset but other options don't exist yet.
It can be helpful to get variable CheXpert benchmark.

How

From AIMI, they provide train_cheXbert.csv and train_visualCheXbert.csv.
I plan to use both "progress bar" and "status" properly.

Download both of CheXbert csvs.
Analysis difference between visualCheXbert and CheXbert.
Training and compare the results.

Add experiment logging analysis script

A script for basic analysis based on hydra multirun+ray tune log data is needed

To do

Show best trial learning progress
Get the best trial's configuration
Comparison between score (hydra multirun level)

Discuss and fix convention

Feature: Customize CLI reporter

What

Customize CLI reporter to print output at appropriate intervals. (per epoch, etc.)

Why

The default CLI reporter prints outputs too frequently, making it difficult to check the results.
Previously, I made the custom reporter (reports at end of each trial) to solved that problem, but there was an inconvenience that the reporting cycle was too long.

It is not a priority, and when I analyzed it last time, it was not as easy as I thought to change it to print in cycles rather than ray Trial units. But I still think it's worth looking into, since the need has come up several times.

How

Choose appropriate reporting intervals
Analysis CLI reporter Class
Customize reporter

Hotfix: Conditional training cannot use transform.py

What

If we use conditional_train option "conditional", it doesn't work (first, from transform.py)

Why

While I apply custom_metrics.py to both train.py and conditional_train.py, I found conditional_train.py didn't work well.
Because the conditional_train.py has not updated after @juppak did, some codes could not work.
Also, in the error of transform.py, it can be revised hydra_cfg to hydra_cfg.Dataset.

How

Also, it has a relation with issue #81 , I'll report to @chrstnkgn .
Revise conditional_train.py to work well

Discussion: Analysis of Hydra Multi-Run Operation

What

For more sophisticated benchmark experiments, there is a need to understand how Hydra actually operates multi-runs.
(Whether it finishes each trial and run the next one in a row or run several trials (through an override method or something) and then return all the results at once, etc. )

Why

Currently, there are several problems arising form the lack of understanding of the multi-run operation of Hydra.
For instance,

Custom logging is being recorded repeatedly as many times as the number of hydra multi-runs.
It is unclear how to weave end-to-end pipelines which executes train->inference at once, in the multi-run experiment.

How

Hydra Multi- Run Code Analysis
Modify custom logging method (if necessary)

Features: Create an inference python file

What

While we discussed about how we get model weights from directory when doing model soups, we noticed that we should create an inference(benchmark) python file.

Why

There are many benchmark githubs have an inference python file(or benchmark python file). Also, some of the functions we'll support need inference file.

How

Can reference validation function that is already existed in train.py.
Also, we'll reference lots of benchmark githubs from NeurIPS, MICCAI, etc.
Any opinions welcome!

Features: Subdivide Hydra log directory

What

Subdividing Hydra logging directory
The results are likely to be as follows.
- single run: logs/train/run/{custom_exp_name}/2023-01-05_05-57-24
- multi run: logs/train/multirun/{custom_exp_name}/{multirun-tiral-number}/2023-01-05_05-57-24

Why

Currently, the outermost logging folder name is set to the time stamp, making it inconvenient to distinguish.

How

Seperate multirun / run dir
Add custom experiment name
Simplify multirun subdir (override_dirname -> numbere)
Check for conflicts or errors due to changes

Features: WandB as an Option

What

Set WandB as an optional logger
Log CLI outputs as a log file when no logger is chosen

Why

Considering the released version of our repo, I figured that there might exist users who don't have any experience in WandB
Which means that it might be inappropriate to put WndB as a default logger

How

Make WandB optional
Make a simple logging process for the users who did not choose any logging option

Features: Apply simple early-stop in the train code

What

Apply simple early-stop in the train code

Why

Motivated by the discussion with @kdg1993 (#30), concluded that applying a simple early-stop for the trial terminator is an appropriate solution for the initial implementation
I think that it is an urgent issue to resolve if we are planning to conduct the experiment before this week ends as our trial terminator for ray is not working quite well

How

Implement a simple early-stop at the end of our train loop
Add a Hydra option to set the patience step for the early stop

Hotfix: Ray related and Working directory problem

What

We now have a simpler structure for ray tune thanks to @jieonh 's work (#27) and conditional training & label smoothing thanks to @juppak (#31), but we have several new bugs to fix.

Why

Logging directory not formed in wandb.init() step
conditional training logged together in wandB with the actual training process

How

Hotfix needed
- I suppose that the structure of initializing wandb should be modified
- After fixing the bug I am planning to explain what caused the problem and what I've changed for better understandings for all members

Discussion: How to save pytorch model weights in each sub experiments

What

Discuss how to save pytorch model weights in each sub experiment in hydra multirun + no ray tune setting
I guess we need to discuss the thing we need to log, log file names, and logging structure maybe

Why

There are 4 different cases ( [hydra multirun on/off] X [ray tune on/off] ) when we do experiments with our custom code set
I figured out that under hydra multirun + no ray tune setting, only one best model is saved (not sure for the other person's environment but for me) like below

./logs
└── 2022-12-08_02-12-25
├── best_saved.pth
├── epochs=2,loss=BCE,mode=default,model=mobilenetv3_small_050,num_samples=2,optimizer=adam
│ └── main_Tuner.log
├── epochs=2,loss=BCE,mode=default,model=tinynet_e,num_samples=2,optimizer=adam
│ └── main_Tuner.log
├── epochs=2,loss=Multi_Soft_Margin,mode=default,model=mobilenetv3_small_050,num_samples=2,optimizer=adam
│ └── main_Tuner.log
├── epochs=2,loss=Multi_Soft_Margin,mode=default,model=tinynet_e,num_samples=2,optimizer=adam
│ └── main_Tuner.log
└── multirun.yaml

How

#14
Apply to the code

Features: Implement augmentation

What

Implement user-friendly & basic image augmentation

Why

Augmentation is one of the important parts of DL but we do not have many options in our code set, so far
- Current augmentation which is based on CheXpert leaderboard is quite a good choice but we need more for covering MIMIC, BRAX, and the others
As a benchmark test bed, persuasive augmentation options will reduce the user's experimental burden
Good augmentation can contribute to better performance

How

Search implementable strategies
torchvision auto-augmentation Implementation
torchvision auto-augmentation code test
albumentations auto-augmentation Implementation (future work)
albumentations auto-augmentation code test (future work)
RandAugment implementation
RandAugment code test
Make an augmentation result visualization codes

Hotfix: Minor bugs and working directory problem

What

CLI reporter not being printed in hydra multirun settings (from the second run)
Change the working directory for hydra multirun without Ray

Why

CLI Reporter

Not a MAJOR problem, but it hinders monitoring which was quite annoying

Working directory

Refer to: #13

How

Check if there were any conflicts regarding Ray reporter and hydra settings
Change the working directory for single run
Change the working directory for hydra multi-run

Hotfix: Inspect all of codes

What

Inspect and revise all of train, utils, inference codes.

Why

There are some codes set default settings such as num_classes = 5. Before experiments and adding more functions, or hyper-parameter tunings.
More, if the file which didn't apply pre-commit exists, it would be applied in this issue.
In addition, if it's more reasonable to use name "utils" rather than "custom_utils", it can be changed in this issue.

Notes: This issue doesn't have relation between working well currently. The purpose of this issue is "checking the codes", not "working well about all of options".

How

Inspect & Revise trainer
Inspect & Revise inference
Inspect & Revise custom_utils

Features: Configure Hydra config directory and files

What

Change Hydra config format neater and more straightforward

Why

As the project proceeds, the parameters that we need to handle are getting broader and more complicated
We have discussed how to change our config format several times during meeting

How

Modify our configs (mainly, trim out some parameters that are hanging at the very front - config.yaml file
Try to follow the format of pl-hydra template (https://github.com/ashleve/lightning-hydra-template)

(Edited) Found the need of dividing the work into small portions for better understanding of the team members and following the flow of fast-merging branches. Therefore, I will Comment my work to this issue one by one and publish pull requests based on the comments

Hotfix: Error while using both mode=raytune & logging=wandb

Intro

At first, I am not certain whether this modes selection (using raytune&wandb simultaneously) is not implemented yet or not
I'm really sorry if I misunderstood the progress of this work
Secondly, it probably occurred due to the specific environment of my docker container

Circumstance

python main.py model=tinynet_e epochs=3 num_samples=2 Dataset.train_size=0.2 logging=wandb mode=raytune
Branch : develop (commit 6b6f63d)
Code change : No

Error

mode=raytune
working dir: /home/CheXpert_code/kdg/CXRAIL-dev
[2022-12-21 01:57:23,674][ray.tune.tune][INFO] - Initializing Ray automatically.For cluster usage or custom Ray initialization, call ray.init(...) before tune.run.
2022-12-21 01:57:26,066 INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Error executing job with overrides: ['model=tinynet_e', 'epochs=3', 'num_samples=2', 'Dataset.train_size=0.2', 'logging=wandb', 'mode=raytune']
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 272, in fit
return self._local_tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 420, in fit
analysis = self._fit_internal(trainable, param_space)
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 532, in _fit_internal
analysis = run(
File "/usr/local/lib/python3.8/site-packages/ray/tune/tune.py", line 626, in run
callbacks = _create_default_callbacks(
File "/usr/local/lib/python3.8/site-packages/ray/tune/utils/callback.py", line 105, in _create_default_callbacks
callbacks.append(TBXLoggerCallback())
File "/usr/local/lib/python3.8/site-packages/ray/tune/logger/tensorboardx.py", line 165, in init
from tensorboardX import SummaryWriter
File "/usr/local/lib/python3.8/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/usr/local/lib/python3.8/site-packages/tensorboardX/torchvis.py", line 10, in
from .writer import SummaryWriter
File "/usr/local/lib/python3.8/site-packages/tensorboardX/writer.py", line 16, in
from .comet_utils import CometLogger
File "/usr/local/lib/python3.8/site-packages/tensorboardX/comet_utils.py", line 7, in
from .summary import _clean_tag
File "/usr/local/lib/python3.8/site-packages/tensorboardX/summary.py", line 12, in
from .proto.summary_pb2 import Summary
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/summary_pb2.py", line 16, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/tensor_pb2.py", line 16, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/usr/local/lib/python3.8/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/usr/local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 560, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "main.py", line 94, in main
raytune(hydra_cfg)
File "main.py", line 61, in raytune
analysis = tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 274, in fit
raise TuneError(
ray.tune.error.TuneError: The Ray Tune run failed. Please inspect the previous error messages for a cause. After fixing the issue, you can restart the run from scratch or continue this run. To continue this run, you can use tuner = Tuner.restore("/home/CheXpert_code/kdg/CXRAIL-dev/logs/2022-12-21_01-57-23/Dataset.train_size=0.2,epochs=3,logging=wandb,mode=raytune,model=tinynet_e,num_samples=2/trainval_2022-12-21_01-57-23").

Suspected reason

Python version and dependency conflict

Related to

Hotfix: Too long file name raises OSError

What

The command is

python main.py --multirun model=resnet,densenet
logging=wandb project_name='aug_efficacy'
logging.setup.name='augmentation_efficacy_test'
conditional_train=none
Dataset.augmentation_mode="auto","random","custom"
hparams_search=raytune
hparams_search.tune_config.num_samples=10
hparams_search.tune_config.scheduler.grace_period=100000
hparams_search.param_space.lr.lower=1e-5
hparams_search.param_space.batch_size.categories=[32,64]

The raised error is

[2023-01-03 05:39:43,143][HYDRA] Launching 6 jobs locally
[2023-01-03 05:39:43,143][HYDRA] #0 : model=resnet logging=wandb project_name=aug_efficacy logging.setup.name=augmentation_efficacy_test conditional_train=none Dataset.augmentation_mode=auto hparams_search=raytune hparams_search.tune_config.num_samples=10 hparams_search.tune_config.scheduler.grace_period=100000 hparams_search.param_space.lr.lower=1e-05
Traceback (most recent call last):
File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 36] File name too long: 'logs/train/2023-01-03_05-39-41/Dataset.augmentation_mode=auto,conditional_train=none,hparams_search.param_space.lr.lower=1e-05,hparams_search.tune_config.num_samples=10,hparams_search.tune_config.scheduler.grace_period=100000,hparams_search=raytune,logging.setup.name=augmentation_efficacy_test,logging=wandb,model=resnet,project_name=aug_efficacy'

Why

Too long file name raises OSError because the directory name while mkdir

How

Need to find a solution by discussion

Features: Apply WandB logger in Ray with the same format as default running

What

Apply WandB logger in Ray with the same format as default running

Why

It is hard to customize logging configs and outputs using WandbLoggerCallback from ray
We would want to simplify logging outputs in ray-WandB to make them in line with those from default running (w/o ray tune)

How

Try customizing setup_wandb function instead of using WandbLoggerCallback from Ray

Features: Do EDA MIMIC-CXR

What

Do EDA(Exploratory Data Analysis) about MIMIC-CXR!

Why

It's necessary to apply MIMIC-CXR in our codes.
Especially, we discussed about difference between CheXpert csv and MIMIC-CXR csv.
In EDA, I focus on the AP/PA conclusion!

How

I'll upload an EDA notebook in notebook directory.
An EDA notebook mainly includes AP/PA conclusion and labels.

references: #5
and codes from @jieonh

Features: Asking for help to add new policies to convert CheXpert target class in our custom Dataset Class

What

Add more policy options based on statistical or intuitive aspects of missing and label converting (Not based on domain knowledge or score)

Why

While I've looked around the target class distribution of CheXpert CSV data, I found an interesting possibility for data handling.
The figure below is a snapshot of target distribution by my personal exploration of CheXpert.

Meanwhile, our current custom Dataset class converts, (Not sure but I guess this way of converting is based on score)

Nan -> 0
-1 -> 1 ( if the target is 'Edema' or 'Atelectasis' )
-1 -> 0 ( if the target is neither 'Edema' nor 'Atelectasis')

In my opinion, converting Nan to 0 is acceptable because 'nothing' often means False (0). Thus, the thing is converting the '-1'
In train set, 11 disease columns among 14 have more labels 1 than 0. Thus, converting -1 to 1 also makes sense to me
In line with distribution-based thinking, converting -1 by random sampling from the total set of 0 and 1 could also be an interesting approach

Likewise, I think there are many ways to apply statistical or intuitive aspects of handling missing values in the traditional ML field. So, I want to discuss it and carefully ask for help to make this idea possible to use in our custom codes

FYI, I include the distribution of valid set just for sharing knowledge but I'm afraid that considering the validation set distribution might be connected to the data leakage issue. Probably everyone knows already but mentioned it just for reminding 😄

How

Any kind of interesting idea can be an option
My simple idea now is to consider the major class for converting candidate or random sampling

Hotfix: Change the order of train_size in the preprocessing sequence

What

Changing the order
from restrict train_size by sampling -> frontal or lateral restriction -> enhancement
to frontal or lateral restriction -> enhancement -> restrict train_size by sampling

Why

So far, the training data size restriction has been done in the early stage of data preprocessing.
However, the current process returns fewer datasets than a given integer or float (thanks for noticing @seoulsky-field).
For example, if you set train_size as 100 & use_frontal as True, the codeset samples 100 data and selects frontal images.
Thus, it returns <= 100 images.
To avoid this, I checked dataset options that affect the number of datasets and
figured out use_frontal & enhancement (upsampling) can reduce or increase the number.

While analyzing the effects of these processing options,
I figured out that enhancement is quite complicated and might return a result that far different from what the user expected.
Currently, the enhancement accepts multiple target columns and n_times (the amount of upsampling).
Since this enhancement works the target column independently (which means that does not consider co-effect),
it duplicates more than given n_times due to the inherent trait of multi-label problem.

Here is a really simple example of the enhancing sequence in our codeset.
original (3A, 4B) -> Enhancing 'A' 2-times (6A, 6B) -> Enhancing 'B' 2-times (8A, 10B) <- more than 2-times of 'A' and 'B'
A B 　　　　　　　　　 A B　　　　　　　　　　　　　　A B
1 0 　　　　　　　　　 1 0　　　　　　　　　　　　　　1 0
1 1 　　　　　　　　　 1 1　　　　　　　　　　　　　　1 1
1 1 　　　　　　　　　 1 1　　　　　　　　　　　　　　1 1
0 1 　　　　　　　　　 0 1　　　　　　　　　　　　　　0 1
0 1 　　　　　　　　　 0 1　　　　　　　　　　　　　　0 1

　　　　　　　　　　　1 0　　　　　　　　　　　　　　 1 0
　　　　　　　　　　　1 1　　　　　　　　　　　　　　 1 1
　　　　　　　　　　　1 1　　　　　　　　　　　　　　 1 1

　　　　　　　　　　　　　　　　　　　　　　　　　　1 1
　　　　　　　　　　　　　　　　　　　　　　　　　　1 1
　　　　　　　　　　　　　　　　　　　　　　　　　　0 1
　　　　　　　　　　　　　　　　　　　　　　　　　　 0 1

It is difficult to determine which way of enhancing (upsampling) is right, but we should definitely recognize this.

How

Code change
Test the length of returning dataset (length of self.df)

Discussion: Alternatives for ASHAscheduler in Ray

What

Find alternatives for ASHA scheduler
- Possible integrations that I can think of now are:
1. Apply early stop with certain number of patiences
2. Find another scheduler that ray provides that better fits our need

Why

While developing and enhancing this project, we have discussed about the scheduler that terminates the training process multiple times and the main issue was that 'ASHA scheduler does not fit our purpose' as it is the algorithm that works well in the multi-processing environment.
If we have firmly decided to stick with ray tune, then we need to seek algorithms that better fulfill our need and terminate the process at the appropriate timing

How

There seem to be several options that we can consider according to the ray docs (https://docs.ray.io/en/latest/tune/api_docs/schedulers.html), but if any of them does not seem appropriate, then it might be better to just go with early stopping
I don't think that it is the part that I can decide alone, so I kindly ask you to freely discuss and provide various opinions here! 🙏

Feature: Refine Hyperparameter Tuning

What

Overall parameter tuning is required when finalizing the benchmark design. In order to provide detailed optimized tuning results for each tasks like retina benchmark, it is necessary to refine current hyper parameter tuning structure.

Why

If hyperparameter tuning is going to performed throughout the code in addition to current basic config tuning (lr, batch_size, etc.), there are some parts that need to be changed in the current structure.
The following areas might be considered:

Parameters that included only in specific cases
- ex) gamma_neg, gamma_pos in AssymetricLoss
Some tuning results might vary depending on the combination
- ex) best learning rate for each model architecture
  - DenseNet : 1e-4, ResNet: 1e-5
Currently, all parameters are included in the ray tune config - param_space, but this part needs to be divided in detail.
ex)
- gamma_neg, gamma_pos -> AssymetricLoss config
- lr, weight_decay, betas, eps -> Optimizer config
- batch_size, seed -> Experimental setting config

ref: retina_benchmark

How

Include ASL configs in searchspace -> To work only when using ASLoss

(The part below is still in the process of planning)

[python code] Modify hyper parameter selecting structure (in train.py- trainval, main.py-defaul,raytune )
[hydra yaml config] Refinement search space structure

Features: WandB logging part as a Hydra option

What

To Add

Change WandB logging part as a Hydra option to:
- make WandB logger able to be turned on/off
- automatically assign experiment name when running the script
- Add option to use WandB when running the experiment without Ray

Why

Initially, I added a WandB logging option supported by Ray to keep track of the experimental results
Then found out that it would be nicer (in terms of both convenience and code clarity) to make WandB optional regardless of the usage of Ray

How

WandB option into Hydra, default setting: ON
WandB option when not using ray
Come up with WandB convention (How to set the project name?) -> Any opinion or discussion would be appreciated

Discussion&Hotfix: RayTune values are always random.

What

Seed fixing doesn't be applied in RayTune.

Why

I think seed fixing is also worked in RayTune. However, as you see below two results images, you could see that seed fixing isn't worked in RayTune. (No options changed, no codes changed)
Because of the perspective of reproducibility, I think we should fix seed in RayTune.

How

Will be decided by discussion.

Features: Ray result analysis tool

What

Create a tool to extract information about best results among several trials of Ray (Tuner.fit())

Why

Simplicity of organizing experimental results

As the number of trials increases, it is difficult to analyze all trials, and ultimately, the reason why the user uses the hyperparameter tuning tool is to find the best result.

How

+) This work is almost complete and will be merged with #35 without creating a separate branch since it is an issue directly related to #35

Features: Metrics and CLI in inference.py file

What

Apply rich progress bar in inference.py file
Append more metrics in inference.py file

Why

In the train.py, we use rich to report in CLI, so I think it looks good to use rich in inference.py, too.
Also, AUROC score is used to metric score generally in medical task, however, we thought it's good to support more details and more metrics.

How

Apply rich progress bar in inference.py file.
Implement to get FPR, TPR, Best threshold.
Plot ROC curve with details.
Implement more metrics. (e.g. AUPRC, F1-score, Accuracy, etc.)

Hotfix: Changing num_samples in default config doesn't work

What

Changing num_samples in default config doesn't work but changing hparams_search.tune_config.num_samples is working.

Changing default config [The case which is not working]

Changing hparams_search config [The case which is working]

Why

Since the default config should be the file that has the most powerful authority, it should be fixed.

How

Need help to fix

Hotfix: Reorganize conditional_train code

What

Reorganize conditional train code, and simplify its config

Why

We are aware that we should somehow modify the code for conditional training, but it has been delayed as it was not a priority.
However, as we are preparing for our first release, neater (and well-working) code is needed.
This issue will cover following parts:
1. (conditional_train.py) Merge train and trainval function - this format is not necessary anymore as we are not using ray tune for conditional learning, and this will resolve some error that was caused by its complicated format.
2. (conditiona_train.yaml) Modify its format - As this is not a major option that the users will always take into account, it would be better to place this config somewhere in the lower level. Trimming some unnecessary configs is also needed.

How

Features: Implement Optuna instead of Ray Tune

What

Implement Optuna instead of Ray Tune

Why

Ray is definitely a good hyperparameter tuning tool, but many problems have been discussed when using ray tune and hydra together so far. Also, if we are going to do just simple tuning, it may be better to use other tools such as Optuna instead of advanced ray. Therefore, I think it is worth to apply Optuna instead of Ray Tune and compare the two pipelines.

How

Implement Optuna
Compare two pipelines in terms of complexity, convenience, etc.

Hotfix: Issue comments are not alerted in slack

What

We discuss about MIMIC in issue #5, but slack doesn't alert!

How

Test with he comments in this issue and do bug fix.

Hotfix: Raytune + wandb logging is not working

What

I tried to test single run + raytune + wandb logging but I figured out wandb logging is not working

My command is

python main.py model=resnet logging=wandb project_name='kdg_dev_test' logging.setup.name='autoaug_ray_test' conditional_train=none Dataset.auto_augmentation=True hparams_search=raytune hparams_search.tune_config.num_samples=10

I also tried

Assigning both : project_name & logging.setup.project
Assigning alone : logging.setup.project

However, I got the same result from all trials

While I tried to find the reason I found "-" in run_config of raytune.yaml

I guess that is a typo, thus I removed and tried again. And then, a difficult error occurred

SUCCESS 12345 SEED FIXING
hyperparameter search: raytune
working dir: /home/CheXpert_code/kdg/CXRAIL-dev
[2022-12-28 04:48:27,817][ray.tune.tune][INFO] - Initializing Ray automatically.For cluster usage or custom Ray initialization, call ray.init(...) before tune.run.
2022-12-28 04:48:31,236 INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Error executing job with overrides: ['model=resnet', 'logging=wandb', 'project_name=kdg_dev_test', 'logging.setup.name=autoaug_ray_test', 'conditional_train=none', 'Dataset.auto_augmentation=True', 'hparams_search=raytune', 'hparams_search.tune_config.num_samples=10']
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 272, in fit
return self._local_tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 420, in fit
analysis = self._fit_internal(trainable, param_space)
File "/usr/local/lib/python3.8/site-packages/ray/tune/impl/tuner_internal.py", line 532, in _fit_internal
analysis = run(
File "/usr/local/lib/python3.8/site-packages/ray/tune/tune.py", line 626, in run
callbacks = _create_default_callbacks(
File "/usr/local/lib/python3.8/site-packages/ray/tune/utils/callback.py", line 59, in _create_default_callbacks
has_trial_progress_callback = any(
TypeError: 'WandbLoggerCallback' object is not iterable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 85, in main
raytune(hydra_cfg)
File "main.py", line 63, in raytune
analysis = tuner.fit()
File "/usr/local/lib/python3.8/site-packages/ray/tune/tuner.py", line 274, in fit
raise TuneError(
ray.tune.error.TuneError: The Ray Tune run failed. Please inspect the previous error messages for a cause. After fixing the issue, you can restart the run from scratch or continue this run. To continue this run, you can use tuner = Tuner.restore("/home/CheXpert_code/kdg/CXRAIL-dev/logs/2022-12-28_04-48-27/Dataset.auto_augmentation=True,conditional_train=none,hparams_search.tune_config.num_samples=10,hparams_search=raytune,logging.setup.name=autoaug_ray_test,logging=wandb,model=resnet,project_name=kdg_dev_test/trainval_2022-12-28_04-48-27").
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Since instantiation of run_config looks pretty clear and right to me, it is hard to find out why the wandbloggercallback has wrong type

If anyone knows or experienced this type of error, please help me to overcome it

Summary:

Single run + raytune + wandb logging is not working in my environment
I suspect a typo in raytune.yaml
Unknown type error by wandbloggercallback occurred

How

Features: Append More Options on CXR dataloader

What

The more I looked at previous work on CheXpert, such as Issue #9, I saw that some options needed to be added.
1. Label Smoothing
2. Conditional Training

Why

Lank 2 paper (https://arxiv.org/abs/1911.06475) use conditional training for tackle the reason that diagnoses are often conditioned upon their parent labels, and use label smoothing for tackle the uncertain data in dataset.
Also Lank 1 paper (https://arxiv.org/abs/2012.03173) use label smoothing. (not yet sure about conditional training)

How

Will Implement Label Smoothing Option on CheXpert dataloader
Will Implement Conditional Training Option on CheXpert dataloader

Comment

Maybe implementing on our code the option 'Conditional Training', will touch the train & valid part. 😨😱

Features: Inference logging advancement

What

Organize test results into a csv file

ref: timm benchmark result
(https://github.com/rwightman/pytorch-image-models/blob/main/results/benchmark-infer-amp-nchw-pt111-cu113-rtx3090.csv)

Save important informations other than test score (model, dataset, optimizer, etc.)

Why

Currently, only the aucroc scores of the test dataset are logged, but it is necessary to organize the inference results well for a benchmark experiment.

How

Extracting necessary information from hydra logging of the training results to be inferenced
Organize the inference result storage path
Save the results as csv file
Add other metrics (if possible. It might need to be an independent issue.)

Discussion: Better ways to improve team-wide understanding of MIMIC datasets

What

Discuss important things about MIMIC dataset that the whole team should know
Propose formats to analyze and share important points about MIMIC

Why

MIMIC has a more complicated data structure than CheXpert
It was one of the key issues of last week's meeting
Team-wide common reference can improve the efficiency of conversation in meeting

How

My simple suggestion is to make a notebook file for EDA MIMIC. Any further suggestions would be very helpful and appreciated!

Features: Multi-GPU training for non-ray setting

What

Implement multi-GPU training for non-ray tune setting

Why

While ray has a built-in parallelization system by the tune.resources functions,
there is no built-in parallel GPU training system for hydra multirun.

Thus, we need to implement parallel GPU setting for hydra multirun + non-ray tune setting, especially for large scale experiments

How

I planned pytorch's nn.parallel.DistributedDataParallel after reading references below

To Do

nn.DataParallel implementation
Deprecate nn.DataParallel and implement nn.parallel.DistributedDataParallel

Features: Apply torch.amp

What

Apply torch.amp to do experiments faster!

Why

AMP is a tool that use both float16 and float32 that can make code execution faster.
So, I think it can make our experiments more efficient and more comfortable.

How

Can reference from PyTorch Image Models(https://github.com/rwightman/pytorch-image-models).
However, it used apex amp that no longer supported because of torch.amp.
So, I reference both PyTorch Image Models and PyTorch documents.

Features: Brief instructions on how to merge MIMIC csv files

What

Make a brief instructions on how to merge MIMIC csv files

Why

Unlike CheXpert dataset, MIMIC has 3 different types of csv files (metadata, split, disease info)
Csv files seem to have ERD based structure
I found a small difference between foreign keys between metadata and the disease info file which is not critical but we need to know
One might think that it should be included in the EDA. I agree with that opinion but after I dug into this issue, I found that it might be out of the scope of conventional Kaggle-style EDA.
I hope this issue helps to unify the whole team's MIMIC disease information dataset and lessen the burden of EDA

How

Make a notebook file and upload it (Make a new branch and merge it to tutorials branch & in the tutorials/MIMIC directory)
Merge to the tutorials branch

Features: Implement training data size selection option

What

Implement training data size selection option for our custom dataset class

Why

For an experimental aspect, it could help to experiment with the necessary amount for metric saturation
It is also helpful for debugging because of reduced code running time

How

Given that the total training data size is not small and the difficulty of stratified sampling of the multilabel targets,
random sampling is considered for the first implementation strategy
Expected input is an integer (for a certain number of samples) or a float in the range 0~1 (for the ratio of total size)

Implementation
Code test

Features: Support saliency method: Grad-CAM

What

Add a function which shows the saliency maps.
First, we support Grad-CAM because it's a representative method.
Save saliency maps with wandb option.

Why

From the perspective of aid to medical doctors, nowadays, Grad-CAM is one of the most used methods.
Even though, unfortunately, it's not a verified method in medical task from the perspective of XAI in deep-learning, many corporations still use saliency maps for medical doctors. So, I determined to provide saliency maps for users who are the medical doctors and researchers.
Among the lots of saliency methods, I chose Grad-CAM first by referring to the paper "Benchmarking saliency methods for chest X-ray interpretation".

How

Implement to support Grad-CAM.
Save saliency maps in local.
Save saliency maps in wandb.