Code Monkey home page Code Monkey logo

rcps's Introduction

Paper

Distribution-Free, Risk-Controlling Prediction Sets

@article{bates-rcps,
  title={Distribution-Free, Risk-Controlling Prediction Sets},
  author={Bates, Stephen and Angelopoulos, Anastasios N and Lei, Lihua and Malik, Jitendra and Jordan, Michael I},
  journal={arXiv preprint arXiv:2101.02703},
  year={2020}
}

Basic Overview

For general information about RCPS, you can check our blog post. This GitHub contains the code we used for the experiments in the RCPS paper. Each experiment lives in a different, appropriately named folder. The directory core contains code common to all of our experiments, including the implementations of concentration bounds and choice of lambda hat. The repository is still a work in progress; we will be continually updating the code to make it more user-friendly and remove clutter from our development. If you have trouble reproducing our results, please email [email protected].

Getting Started

We store some large files in our git repo via git-lfs; you may need to install and configure it from here. After installing git-lfs, you can clone this repository. Then, you can create the rcps conda environment by running the following line:

conda create --name rcps --file ./requirements.txt 

Each experiment requires different datasets. For the ./imagenet and ./hierarchical_imagenet experiments, you will need to point the scripts towards the val directory of your local copy of the Imagenet dataset. Similarly, for ./coco, you need to point the scripts towards your local copy of the 2017 version of MS COCO, available here. For the ./polyp and ./protein examples, a bit more work must be done.

Polyp data

We used data from five different datasets: HyperKvasir-SEG, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS-LaribPolypDB. Download each of these datasets and unzip them into the folder ./polyps/PraNet/data/TestDataset/{datasetname}. Then run the script ./polyps/PraNet/process_all_data.py, which should store the outputs of the tumor prediction model in the proper directory so you can run our experiments.

Protein data

For the AlphaFoldv1 experiments in ./proteins, you can point the scripts to the alphafold CASP-13 test set, available here.

License

MIT License

rcps's People

Contributors

aangelopoulos avatar anastasiosna avatar currywhite30 avatar stephenbates19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rcps's Issues

Missing weighted_loss()

In line 34 of risk_histogram.py in the imagenet directory, there is a reference to the weighted_loss() which is not defined.

Can you tell me where to look for it or can you update the repo with the definition?

Thanks!
Screenshot 2023-02-05 at 11 19 33 AM

Applications to financial industry or NLP

I read your fantastic paper on A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification and wanted to ask about the application of it to industries. There seems to be applications for multiple markets, the potential looks impressive but I'm still trying to wrap my head around implementation or integration perhaps. In section 2.3 figure 6 does a great job of explaining what it does and in section 2.4 Conformalizing Bayes you talk about priors.

As far as implementation you show more examples of image classification and image processing in the conformal_classifcation GitHub repo, but many of your videos I believe mention this in application of the stock market and risk assessment. Can you clarify, also if this isn't the place for these questions let me know where is appropriate.

I'm researching using something like your method for my work in NLP or hobby algorithmic financial risk assessment but there are many questions your great paper solved and a few more it created.

Some questions about the implementation of the Waudby-Smith–Ramdas (WSR) bound

If I understand correctly, bounds.WSR_mu_plus() implements the Waudby-Smith–Ramdas (WSR) bound.

rcps/core/bounds.py

Lines 126 to 137 in b400457

def WSR_mu_plus(x, delta, maxiters): # this one is different.
n = x.shape[0]
muhat = (np.cumsum(x) + 0.5) / (1 + np.array(range(1,n+1)))
sigma2hat = (np.cumsum((x - muhat)**2) + 0.25) / (1 + np.array(range(1,n+1)))
sigma2hat[1:] = sigma2hat[:-1]
sigma2hat[0] = 0.25
nu = np.minimum(np.sqrt(2 * np.log( 1 / delta ) / n / sigma2hat), 1)
def _Kn(mu):
return np.max(np.cumsum(np.log(1 - nu * (x - mu)))) + np.log(delta)
if _Kn(1) < 0:
return 1
return brentq(_Kn, 1e-10, 1-1e-10, maxiter=maxiters)

I have some (seemingly trivial) questions regarding the said function. Any form of guidance is appreciated.

  1. In the subroutine _Kn(), how does one get the + np.log(delta) term from Proposition 5?
  2. Why do we check whether _Kn(1) is negative, and, if it is, subsequently return a WSR bound of 1?
  3. When invoking scipy.optimize.brentq(), why do we use a “smaller” search interval (i.e., between 1e-10 and 1-1e-10) instead of between 0 and 1?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.