Code Monkey home page Code Monkey logo

decile-team / cords Goto Github PK

View Code? Open in Web Editor NEW
315.0 315.0 50.0 58.87 MB

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

Home Page: https://cords.readthedocs.io/en/latest/

License: MIT License

Python 45.44% Jupyter Notebook 54.56%
compute-efficient-ml deep-learning energy energy-requirements machine-learning speedups-training

cords's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cords's Issues

Questions about accuracy logging

Hello! Thanks for your great work.

I'm currently working on this code and I want to ask a question about accuracy logging.

cords/train.py

Lines 530 to 541 in ff629ff

for batch_idx, (inputs, targets) in enumerate(testloader):
# print(batch_idx)
inputs, targets = inputs.to(self.configdata['train_args']['device']), targets.to(self.configdata['train_args']['device'], non_blocking=True)
outputs = model(inputs)
loss = criterion(outputs, targets)
tst_loss += loss.item()
tst_losses.append(tst_loss)
if "tst_acc" in print_args:
_, predicted = outputs.max(1)
tst_total += targets.size(0)
tst_correct += predicted.eq(targets).sum().item()
tst_acc.append(tst_correct/tst_total)

In line 541 of train.py, val_acc contains cumulative accuracies over input batches. For example, if the loader contains 4500 examples and the batch size is 1000, then tst_acc has 5 accuracies per each evaluation. (the first element of tst_acc will be the accuracy over the first 1000 examples)

cords/train.py

Lines 631 to 633 in ff629ff

if "tst_loss" in print_args:
if "tst_acc" in print_args:
print("Test Data Loss and Accuracy: ", tst_loss, np.array(tst_acc).max())

In line 633, it prints the best value in tst_acc. In this case, the resulted best accuracies over different algorithms and seeds might be the values evaluated on different test samples.

Is this what you intended? In my experience, I think evaluating algorithms on an identical test dataset is a convention.
In addition, is the reported test accuracies in the GRAD-MATCH paper the best values as above or the last test accuracy?

Best,
Jang-Hyun

Typo in cords_cifar10_glister_train.ipynb

There is a typo in the cords_cifar10_glister_train.ipynb notebook :
https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/cords_cifar10_glister_train.ipynb

glister_trn.configdata.train_args.print_every = 1
glister_trn.configdata.train_args.device = 'cuda'
glister_trn.configdata.dss_args.fraction = fraction

instead of

glister_trn.cfg.train_args.print_every = 1
glister_trn.cfg.train_args.device = 'cuda'
glister_trn.cfg.dss_args.fraction = fraction

Evaluation on ImageNet

Hello, thanks for a very interesting and useful project.

Could you mind providing an evaluation method for ImageNet?
I tried to, adding loader for ImageNet to custom_dataset.py, but failed due to a GPU memory issue during subset selection.

Many thanks!

Synthetic data experiments and tutorials

  1. Perform subset selection experiments on synthetic data with a detailed visualization of the subsets selected and the testing performance of the models when trained on these subsets.

[Bug] Got weight with same value when running examples.

Hi, I tested the example with Supervised learning and Glister strategy.
https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb
But when I print the weight of the train loader, they are all 1.0. I believe that by using Glister strategy, we will get different weights.

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1.], device='cuda:0')

Is that a bug or something special?
Thanks.

Models and Examples for Tabular Data

Hi, I'm interested in using cords for tabular data and I noticed a planned work in this page, but which has not been realized yet. Is there any plan regarding this? How could I contribute?
Thanks!

Refactor the folders in the repo

  • Add a folder called benchmarks which has all the results/benchmarks for the various cases. We should remove the results from the main readme and point them to that folder. Also, add the notebooks to reproduce the benchmark results
  • Rename notebooks to tutorials. Add different tutorials based on use-cases (NLP, Vision, SSL, Hyper-parameter tunings, NAS, etc.)

Noisy Label experiments

  1. Perform a detailed analysis of the performance of the subset selection strategies in the presence of noise labels.

Possible bug calculating "trn_loss" and "tst_loss"

Hello,

I have noticed a potential bug in the calculation of trn_loss and test_loss. The trn_loss is currently computed on the entire train dataset using train_eval_loader. This data loader has a batch size that is 20 times larger than that of trainloader. Consequently, when calculating the trn_loss with the train_eval_loader, it is necessary to use the batch size of train_eval_loader rather than the batch size of trainloader.

Likewise, when calculating the test_loss, we should use the batch size of test_eval_loader instead of the batch size of testloader.

trn_loss += (loss.item() * trainloader.batch_size)

tst_loss += (loss.item() * testloader.batch_size)

Detectron2 usage

Hi, how to add sampling hooks to Detectron2 using this library

Update Documentation

  1. Update readitdocs with new documentation and release the latest version of CORDS

Implement CRUST Algorithm

  1. Implement the CRUST strategy in the supervised learning setting.
  2. Create the CRUST data loader class building it on top of adaptive_dataloader class.

Logistic Regression support for Gradmatch

Logistic Regression model throws errors when we do back propagation. The fix for this is perhaps making freeze=False in forward function of utils/models/logreg_net.py

Gradmatch Data subset selection method making training slow

I tried to run some experiments as follows:

  • Ran full cifar10 without any subset selection method to train resnet50 which took around 32m 31s.
  • Ran Gradmatch cifar10 subset selection with 0.1 fractions taking longer time than full cifar10 i.e 22h 48m 40s.
  • Ran Gradmatch cifar10 subset selection with 0.3 fractions taking longer time than 0.1 Gradmatch selection method.

I am using scaled resolution images of cifar10 i.e 224x224 resolution and accordingly defined resnet50 architecture.
Can you let me know how to speed up experiments 2 and 3? In general subset selection method should faster the whole training process right?

For GRAD_MATCH method, the weights associated with each data point in X(subset of training set)

  1. For GRAD-MATCH method, there are weights associated with each data point in X(subset of training set). Do the weights have physical significance? for example, if the value of the weight is higher, the relevant selected data has the greater contribution to the residual?
  2. During the iteration, the selective index is in the selected indices, so the iteration break. why this happen?
    thanks@krishnatejakk

Segmentation fault (core dumped)

Hi,

I was trying to deploy CORDS selection to my training, but this error popped out Segmentation fault (core dumped).

I imitated code from https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb.

So basically I put my training and testing loader into GLISTERDataLoader, and switched this part into my code

for _, (inputs, targets, weights) in enumerate(dataloader): inputs = inputs.to(device) targets = targets.to(device, non_blocking=True) weights = weights.to(device) optimizer.zero_grad() outputs = model(inputs) losses = criterion_nored(outputs, targets) loss = torch.dot(losses, weights/(weights.sum())) loss.backward()

before modifying my code was running fine, so I believe there is an error inside the CORDS, my dataset is CIFAR10.

Thanks

Documentation Improvement

  1. Hyperparameter tuning results and the colab tutorial

  2. Make sure all the tutorial links are working

  3. One document explaining all configurable parameters

  4. New main page in documentation with the current results for CORDS

  5. Remove models doc in documentation with a list of all available models

6)Go over the documentation and make sure everything is okay

Inquiry about performance of gradmatch

Hello, I ran some experiments with gradmatch and randomonline, and find these two actually reach similar performances after 300 epochs, which is around 93, is there something important to note for reproducing the results? Thanks for your help!

Can't install on MACOS

I'm trying to install CORDS on macOS, but there is this dependency conflict. It looks like there is no torchtext==0.10.1 for Mac. Is there any tutorial to make it work on macOS?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.