oatml / bdl-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

658.0 23.0 90.0 1.68 MB

Bayesian Deep Learning Benchmarks

License: Apache License 2.0

Python 22.36% Jupyter Notebook 77.64%

bayesian-deep-learning benchmark-framework diabetic-retinopathy

bdl-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

markjacksonfishing h21k xclu kelvinson shaunstanislauslau hassamsheikh ranarag timrudner stjordanis phamcuong92 batermj lahiruoshara johnreid mcgrady20150318 codeaudit hsouporto linhduongtuan abhinavm24 wmmxk paulhendi birajaghoshal jsyzeng andandandand xuchen86 mandiehyewon ahsong2000 jrubin01 yanqiuche najibodhah arita37 yinsenm gabvaztor marcelomata seongl dbaruajuniv gmontanari hsmith876 lxmwust ashuein msalehi64 sourav-roni tejamoy tuong-ai rajneeshaggarwal tgaston pgsrv mahmudrahman datascienceizmir ltdung saderholm suyanzhou626 johnleehit masyitahabu maximlippeveld congzhengithub etarakci-hvl leparalamapara anu-bioinfo fengwangcn tarasholovko zweien vspatilim1 nguyenanhtien msrocean madnanq qiao-maoying sichistory mengzifds iamyourboss elnazsn1988 chang111 gbaydin siaimes kelelexu phoitack fernandamserra ross-hr genesisjane nband sairam954 fagan2888 standardgalactic jiemei1994backup sarahboufelja har07 hplin6 coallaoh shireenkmanch berkayakturk

bdl-benchmarks's Issues

what should I add in 'model_checkpoints' in Deep Ensembles and Ensemble MC Dropout ?

baseline->'Deep Ensembles' and 'Ensemble MC Dropout'->main.py

flags.DEFINE_spaceseplist( name="model_checkpoints", default=None, help="Paths to checkpoints of the models.", )
which path should I add into? Thank you!

Benchmark incompatible with tensorflow-datasets versions > 1.2.0

The package requirements and setup.py should indicate that it is compatible tensorflow-datasets==1.2.0.

ModuleNotFoundError: No module named 'tensorflow.compat.v2'

diabetic_retionpathy_diagnosis.ipyb has an issue.
I installed tfp-nightly, TF 2.1, but such error happens.
This code is TF 2.0-beta version, and is this the issue?

Plots showing results in Colab notebook display metrics on different scales.

I could be wrong, but this plot for example

is clearly suspect since the naive method is giving an AUC around 0 and not 50. I believe the issue is that the results dictionary gives the metrics on a scale from [0,1], while the baseline models give the metrics on a scale of [0,100]. I can't actually find in the code where these baseline results are scaled by 100, or else I would happily make a pull request. In general, I think it would be easier to keep everything on a scale of [0,1]

[question] What is the random refer to in the diagrams

I was wondering what does the random baseline refer to actually in tables/diagrams?
Lastly, any idea why are ensembles performing worse than mc-drop, I think in the paper of deep ensembles they showed that they performed better than mc-drop (plz correct me if I'm wrong)

Thanks!

replicating results of leaderboard

I've been trying to replicate the results of your leaderboard, but I found a number of things confusing (based on the "medium" data in the linked colab):

leaderboard is based on "realworld" level, but colab is based on "medium" level, do you have ready medium results?
using a vgg-16 model (the one found in mc_dropout/model) and training, I found the below results:

for deterministic:
(accuracy with pink the deterministic)

and for mc_dropout:

with numbers (first is mc_dropout and second is deterministic)

In your paper mc_dropout outperformed the deterministic approach by a quite a bit, I didn't expect the deterministic approach to perform so badly, these results seem a bit more sensible but not to this other extent, can you find the reason for this discrepancy?

AUC results behave weirdly:
for mc_dropout

here is a colab to replicate the above
also recommend updating your linked colab with the proper required packages as in it's current form it does not run

Interpretation of the Predictive Uncertainty for descision making

Hello,
In classification Is there any way to interpret the obtained Predictive Uncertainty? After computing the predictive uncertainty is there any way to calculate any threshold or cutoff value(as you have mentioned here:https://camo.githubusercontent.com/e78af0e93f0ea7cc80e38f7b9273486bbf6f37f6/687474703a2f2f7777772e63732e6f782e61632e756b2f70656f706c652f616e67656c6f732e66696c6f732f6173736574732f62646c2d62656e63686d61726b732f646961676e6f7369732e706e67) so that if the predictive variance is above that value we can say that the model is uncertain or below which it is certain about its prediction?
Uncertain if (predictive variance>=threshold) || Certain if (predictive variance<threshold)
How to compute this threshold!
Thanks!

TF 2.0 full release breaks image preprocessing

Hi OATML,

(My apologies in advance as preview mode was indicating some formatting issues that I'm not able to fix.)

TF 2.0 breaks image preprocessing for bdl-bechmarks. The problem appears to be that TF 2.0 changed how it works with python's local symbol table since TF 2.0 Beta.

This means that transforms.compose is no longer able to properly compose a transformation for TF's dataset.map function. The information that the compose function requires is no longer present in the local table:

TF 2.0 Beta provides:

output of locals(): {
'nargs': 1,
'f': <bdlb.diabetic_retinopathy_diagnosis.benchmark.DiabeticRetinopathyDiagnosisBecnhmark._preprocessors..Parse object at 0x7f6a6314bcf8>,
'inspect': <module 'inspect' from '/usr/local/lib/python3.6/inspect.py'>,
'x': {'image': <tf.Tensor 'args_0:0' shape=(None, 256, 256, 3) dtype=uint8>,
'label': <tf.Tensor 'args_1:0' shape=(None,) dtype=int64>,
'name': <tf.Tensor 'args_2:0' shape=(None,) dtype=string>},
'self': <bdlb.core.transforms.Compose object at 0x7f6a5c1c7240>}

While TF 2.0 provides only:

output of locals(): {
'caller_fn_scope': <tensorflow.python.autograph.core.function_wrappers.FunctionScope object at 0x7efaf43c0080>,
'kwargs': None, 'args': (),
'options': <tensorflow.python.autograph.core.converter.ConversionOptions object at 0x7efaf43c02b0>,
'f': }

A simple proposed fix to get things working again is to not rely on locals() to discern information about a class being passed to the composition, but instead to define explicit class function signatures, so that we rely solely on how python can interpret itself (i.e. use only the inspect module). Note, of course, that this solution is not robust to potential changes in preprocessing functions that may be composed.

First, we create a unique function signature for the CastX() corner case in bdl-benchmarks/bdlb/diabetic_retinopathy_diagnosis/benchmark.py:

`def call(self, x, y_nochange):

    return tf.cast(x, self.dtype), y_nochange`

Then we use this to compose instead of locals(), in bdl-benchmarks/bdlb/core/transforms.py:

`def call(self, x):
import inspect

for f in self.trans:
  
  last_x, last_y = None, None  

  nargs = len(inspect.signature(f).parameters)

  if (nargs == 2) and ("y" in inspect.signature(f).parameters):
    print("y in locals")    
    x, y = f(x, y)
    last_x = 1
    last_y = 1
    
  else:
    if nargs == 1:
      x = f(x)
      last_x = 1
      
    else:
      x, y = f(x[0], x[1])
      last_x = 1
      last_y = 1  
  
# If the last function has in the composition 2 variables to return, do so, otherwise return only 1 variable

if last_y != None:
  return x, y
else:
  return x`

Error while Importing Data on colab

Hello Sir,

I want to run your baseline code for the diabetic-retinopathy-diagnosis benchmark. The google colab does not support the word "URL" and When I changed the word "URL" to "Homepage" for accessing data, It's giving me the following error while running the diabetic_retinopathy_diagnosis .ipynb file.

TypeError: ['https://www.kaggle.com/c/diabetic-retinopathy-detection/data'] has type list, but expected one of: bytes, unicode. Here is the attached image of the error.

can you please help me in resolving this particular error?

PyTorch DataLoaders

Hi,

I'm looking forward to contributing a couple of our benchmarks into this repo, but am not seeing PyTorch dataloaders (or really any PyTorch support). Are there still plans to have PyTorch data loader support for the segmentation tasks?

Additionally, is there a timeline for making public the other benchmarks in the pre-alpha branch?

Thanks,
Wesley

Problem with tensorflow datasets

Hi,

After downloading the diabetic retinopathy diagnosis data and extracting the files using the download_and_prepare utility in the benchmark.py file as suggested in the instructions, the "prepare" part of this procedure gives me errors. I'm running this using the command python3 -u -c "from bdlb.diabetic_retinopathy_diagnosis.benchmark import DiabeticRetinopathyDiagnosisBecnhmark; DiabeticRetinopathyDiagnosisBecnhmark._prepare()" to avoid downloading the data again. The error happens on line 401 in benchmark.py (dtask.download_and_prepare()) and traces back to "python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py" on line 970: for key, record in utils.tqdm(generator, unit=" examples", total=split_info.num_examples, leave=False): and the final error message is "ValueError: too many values to unpack (expected 2)". It seems that the object from the "generator" is of the form {'name': '58_right', 'image': <_io.BytesIO object at 0x 7f7b9a2a28f0>, 'label': 0} with three fields while the for-loop is trying to split it into two ("key" and "record"). Do you have any ideas if something has probably gone wrong already in the extraction part or if the issue might be in the version of tensorflow_datasets for example? Somebody mentioned the use of tensorflow_datasets==1.2.0 in another issue but that didn't seem to solve this.

Arguments used to generate leaderboard results

Thanks for the nice benchmarks and repo. Are the CSV files in the leaderboard directory generated with the medium or realworld level?