Code Monkey home page Code Monkey logo

badge's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

badge's Issues

Minor MLP typo

Hey Jordan, lm2 isn't actually used in this MLP implementation:

badge/run.py

Line 191 in b170600

self.lm2 = nn.Linear(embSize, embSize)

This seems like a minor typo since the current functionality is consistent with the paper (two-layer perceptron: self.lm1 + self.linear), but figured I'd point it out.

Thanks

Large embedding vector, large number of classes?

Hi authors,
Thanks for pushing this fantastic tool. I have a question regarding to the approach itself: When my my last layer dimension is really large + I have large number of classes to cover, if we compute the gradient vector, its dimension(embDim * nLab) will be of orders of magnitude 10^4 or more. Do you think BADGE is an efficient solution in this case?

license

Hi,
Could you add a license to this project? (If you want the most permissive, you probably look for MIT license)

Entropy sampling issue

There is an error in your implementation of entropy sampling.

The following is an example and the result is when using implemented entropy sampling.

probs = [[0.1, 0.2, 0.7], [0.0, 0.1, 0.9], [0.0, 0.5, 0.5]]
probs = torch.tensor(probs)

log_probs = torch.log(probs)

U = (probs*log_probs).sum(1)

print(log_probs)
print(U)
---------
-- Print Result -- 
tensor([[-2.3026, -1.6094, -0.3567],
        [   -inf, -2.3026, -0.1054],
        [   -inf, -0.6931, -0.6931]])
tensor([-0.8018,     nan,     nan])
------------

If this issue is not addressed, the results of your implemented entropy sampling will inevitably be much worse than margin sampling.

This can be solved simply by adding the following code:

log_probs = torch.log(probs)

log_probs[log_probs == float("-inf")] = 0
log_probs[log_probs == float("inf")] = 0

No pseudo/hypothetical labels founded in the code.

Hi Jordan Ash,

In your paper, the gradient embedding is from the loss between the network output and the hypothetical labels (inferred from the network output).

However, in your code, I didn't find anything about pseudo/hypothetical labels.

In the file badge_sampling.py, it seems that you directly use the true labels to guide your selection. If so, this would be an unfair comparison.

    gradEmbedding = self.get_grad_embedding(self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled]).numpy()

I'm not sure if I miss something. Could your show how you use the hypothetical labels in your code?

Thanks,
Rui

Different results

Hi authors,
It's great that you publish the source code, but the default hyper-parameters seem not to be correctly tuned, so could you please share the hyper-params to produce the graph below (taken from your paper)
I also add the graph showing the results that I ran with hyper-params from your source code with command
python run.py --model vgg --nQuery 100 --data CIFAR10 --alg badge

Reported in the paper:
image
Run default code: (the x-axis is the number of data point, divided by 100 - the query batch size)
The gap is nearly 10% even with more data
image

Different loss function

Hi! Very great work and wonderfull code. Thanks man!
Have you tried non cross-entropy-based loss functions? I think off loss function in unsupervised methods for anomaly detection for filtering out anomly datapoints. One example for such a loss function is the differences between student and teacher intermediate layers. Am I right, that if it only depends on gradient, we could skip the hallucination-part and use the derivate only?
Have a nice day
Tobi

Utility of calculating gram matrix in k-means++ method

Hi Jordan Ash,

First of all, thank you for sharing such a clear and concise code. It has been a delight while going through the code step by step and understanding it.
I was curious about the last steps in kmeans++ code that involves calculating gram matrix followed by eigenvalues calculation. However, I saw no usage of it anywhere while sampling. I wanted to know if its as usual or something is missing in kmeans code.

Thanks,
Shwetank

Got different accuracy results

Thank you for sharing the source code for this great work.

I am trying to replicate some of your baseline results.
However, the result from the baseline (LeastConfidence) I got is much better than the one reported in your paper.
Here is my result accuracy plot (It achieves 40% test accuracy much earlier than the reported accuracy in the paper).

Query size: 100
Dataset: CIFAR-10
Network: Resnet
Strategy: LeastConfidence
image

Should I do something to replicate your results?

Typo in run.py

badge/run.py

Line 56 in b170600

if 'CIFAR' in opts.data: args.lamb = 1e-2

if 'CIFAR' in opts.data: args.lamb = 1e-2

The 'args.lamb' should be opts. lamb.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.