decile-team / cords Goto Github PK

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

Home Page: https://cords.readthedocs.io/en/latest/

License: MIT License

Python 45.44% Jupyter Notebook 54.56%

compute-efficient-ml deep-learning energy energy-requirements machine-learning speedups-training

cords's People

Stargazers

Watchers

Forkers

vtnahsus savan77 krishnatejakk dheerajnbhat dornavineeth ayushbits nss-programmer cyfer0618 carnife-x manasvisagarkar venkatapathy satyadevntv chaoticcross12 johntzwei surajkothawade yashfulzele jcribeiro98 vitvicky shrutimoy10 mahmoudzamani chisenzhang neeteshdadwariya fanwangm dduan97 rnaimehaom samyakjain0112 hmdolatabadi ravishah1 tzadouri bwinken uicids560 leolv131 johnny-oh mohitburkule douxiaotian chrisfsj2051 zhangtianer521 youdutaidi rojinva rishabh-16 rsk27 aweditya gradcshou zixind njkbarry minhngt62 fobow kellsky amirhdaghestani techman0256

cords's Issues

Regularized versions for GLISTER, GradMatch for subset selection

Questions about accuracy logging

Hello! Thanks for your great work.

I'm currently working on this code and I want to ask a question about accuracy logging.

cords/train.py

Lines 530 to 541 in ff629ff

    
           for batch_idx, (inputs, targets) in enumerate(testloader): 
        
               # print(batch_idx) 
        
               inputs, targets = inputs.to(self.configdata['train_args']['device']), targets.to(self.configdata['train_args']['device'], non_blocking=True) 
        
               outputs = model(inputs) 
        
               loss = criterion(outputs, targets) 
        
               tst_loss += loss.item() 
        
               tst_losses.append(tst_loss) 
        
               if "tst_acc" in print_args: 
        
                   _, predicted = outputs.max(1) 
        
                   tst_total += targets.size(0) 
        
                   tst_correct += predicted.eq(targets).sum().item() 
        
                   tst_acc.append(tst_correct/tst_total)

In line 541 of train.py, val_acc contains cumulative accuracies over input batches. For example, if the loader contains 4500 examples and the batch size is 1000, then tst_acc has 5 accuracies per each evaluation. (the first element of tst_acc will be the accuracy over the first 1000 examples)

cords/train.py

Lines 631 to 633 in ff629ff

    
           if "tst_loss" in print_args: 
        
               if "tst_acc" in print_args: 
        
                   print("Test Data Loss and Accuracy: ", tst_loss, np.array(tst_acc).max())

In line 633, it prints the best value in tst_acc. In this case, the resulted best accuracies over different algorithms and seeds might be the values evaluated on different test samples.

Is this what you intended? In my experience, I think evaluating algorithms on an identical test dataset is a convention.
In addition, is the reported test accuracies in the GRAD-MATCH paper the best values as above or the last test accuracy?

Best,
Jang-Hyun

Typo in cords_cifar10_glister_train.ipynb

There is a typo in the cords_cifar10_glister_train.ipynb notebook :
https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/cords_cifar10_glister_train.ipynb

glister_trn.configdata.train_args.print_every = 1
glister_trn.configdata.train_args.device = 'cuda'
glister_trn.configdata.dss_args.fraction = fraction

instead of

glister_trn.cfg.train_args.print_every = 1
glister_trn.cfg.train_args.device = 'cuda'
glister_trn.cfg.dss_args.fraction = fraction

Refactor hyperparameter tuning code for easier integration with different settings like supervised learning, semi-supervised learning and NAS

Unit tests for existing code

Evaluation on ImageNet

Hello, thanks for a very interesting and useful project.

Could you mind providing an evaluation method for ImageNet?
I tried to, adding loader for ImageNet to custom_dataset.py, but failed due to a GPU memory issue during subset selection.

Many thanks!

CORDS: Empirical study of different selection strategies in different setting like regression, classification, object detection

Synthetic data experiments and tutorials

Perform subset selection experiments on synthetic data with a detailed visualization of the subsets selected and the testing performance of the models when trained on these subsets.

[Bug] Got weight with same value when running examples.

Hi, I tested the example with Supervised learning and Glister strategy.
https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb
But when I print the weight of the train loader, they are all 1.0. I believe that by using Glister strategy, we will get different weights.

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1.], device='cuda:0')

Is that a bug or something special?
Thanks.

Models and Examples for Tabular Data

Hi, I'm interested in using cords for tabular data and I noticed a planned work in this page, but which has not been realized yet. Is there any plan regarding this? How could I contribute?
Thanks!

Replace apricot by submodlib in cords

CORDS: Get an efficient version of computing gradients for general loss function

Implement faster version of OMP

Implement the following versions of OMP:

Add robust k-center algorithm (https://arxiv.org/abs/1708.00489) as a DSS strategy

Integrate FFCV dataloaders into CORDS for better efficiencies

Refactor the folders in the repo

Add a folder called benchmarks which has all the results/benchmarks for the various cases. We should remove the results from the main readme and point them to that folder. Also, add the notebooks to reproduce the benchmark results
Rename notebooks to tutorials. Add different tutorials based on use-cases (NLP, Vision, SSL, Hyper-parameter tunings, NAS, etc.)

Noisy Label experiments

Perform a detailed analysis of the performance of the subset selection strategies in the presence of noise labels.

Modify submodular selection strategy to include different submodular functions

Possible bug calculating "trn_loss" and "tst_loss"

Hello,

I have noticed a potential bug in the calculation of trn_loss and test_loss. The trn_loss is currently computed on the entire train dataset using train_eval_loader. This data loader has a batch size that is 20 times larger than that of trainloader. Consequently, when calculating the trn_loss with the train_eval_loader, it is necessary to use the batch size of train_eval_loader rather than the batch size of trainloader.

Likewise, when calculating the test_loss, we should use the batch size of test_eval_loader instead of the batch size of testloader.

cords/train_sl.py

Line 616 in a3d8dc3

trn_loss += (loss.item() * trainloader.batch_size)

cords/train_sl.py

Line 671 in a3d8dc3

tst_loss += (loss.item() * testloader.batch_size)

Implement master train strategy code for all selection strategies

Detectron2 usage

Hi, how to add sampling hooks to Detectron2 using this library

Update Documentation

Update readitdocs with new documentation and release the latest version of CORDS

Add more tutorials for different domains (e.g. vision, NLP, standard ML, etc.)

Setup a CI testing flow and integration test cases

Integrate with Pytorch lightning

Integrate SSL into cords and specifically retrieve

Outdated based on the current GitHub repo code.

Refactor of the Dataloader so we have a DSSDataloader in pytorch which internally does all the data subset selection

Implement a better version of OMP algorithm

Implement CRUST Algorithm

Implement the CRUST strategy in the supervised learning setting.
Create the CRUST data loader class building it on top of adaptive_dataloader class.

Get some detailed tutorials on Auto ML

Logistic Regression support for Gradmatch

Logistic Regression model throws errors when we do back propagation. The fix for this is perhaps making freeze=False in forward function of utils/models/logreg_net.py

Get a detailed benchmarks comparing different existing hyperparameter tuning libraries

Hyperparameter tuning libraries for comparison:

raytune
determined AI
Ax

Setup a PyPI github action

Gradmatch Data subset selection method making training slow

I tried to run some experiments as follows:

Ran full cifar10 without any subset selection method to train resnet50 which took around 32m 31s.
Ran Gradmatch cifar10 subset selection with 0.1 fractions taking longer time than full cifar10 i.e 22h 48m 40s.
Ran Gradmatch cifar10 subset selection with 0.3 fractions taking longer time than 0.1 Gradmatch selection method.

I am using scaled resolution images of cifar10 i.e 224x224 resolution and accordingly defined resnet50 architecture.
Can you let me know how to speed up experiments 2 and 3? In general subset selection method should faster the whole training process right?

Inclusion of new tutorials for CORDS with better documentation

No such file or directory: '/mnt/data/cifar100_clip-ViT-L-14_fl_0.1_global_order.pkl'

Hello, how to find the 'cifar100_clip-ViT-L-14_fl_0.1_global_order.pkl' file?

For GRAD_MATCH method, the weights associated with each data point in X(subset of training set)

For GRAD-MATCH method, there are weights associated with each data point in X(subset of training set). Do the weights have physical significance? for example, if the value of the weight is higher, the relevant selected data has the greater contribution to the residual?
During the iteration, the selective index is in the selected indices, so the iteration break. why this happen?
thanks@krishnatejakk

Segmentation fault (core dumped)

Hi,

I was trying to deploy CORDS selection to my training, but this error popped out Segmentation fault (core dumped).

I imitated code from https://github.com/decile-team/cords/blob/main/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb.

So basically I put my training and testing loader into GLISTERDataLoader, and switched this part into my code

for _, (inputs, targets, weights) in enumerate(dataloader): inputs = inputs.to(device) targets = targets.to(device, non_blocking=True) weights = weights.to(device) optimizer.zero_grad() outputs = model(inputs) losses = criterion_nored(outputs, targets) loss = torch.dot(losses, weights/(weights.sum())) loss.backward()

before modifying my code was running fine, so I believe there is an error inside the CORDS, my dataset is CIFAR10.

Thanks

Documentation Improvement

Hyperparameter tuning results and the colab tutorial
Make sure all the tutorial links are working
One document explaining all configurable parameters
New main page in documentation with the current results for CORDS
Remove models doc in documentation with a list of all available models

6)Go over the documentation and make sure everything is okay

	for batch_idx, (inputs, targets) in enumerate(testloader):
	# print(batch_idx)
	inputs, targets = inputs.to(self.configdata['train_args']['device']), targets.to(self.configdata['train_args']['device'], non_blocking=True)
	outputs = model(inputs)
	loss = criterion(outputs, targets)
	tst_loss += loss.item()
	tst_losses.append(tst_loss)
	if "tst_acc" in print_args:
	_, predicted = outputs.max(1)
	tst_total += targets.size(0)
	tst_correct += predicted.eq(targets).sum().item()
	tst_acc.append(tst_correct/tst_total)

	if "tst_loss" in print_args:
	if "tst_acc" in print_args:
	print("Test Data Loss and Accuracy: ", tst_loss, np.array(tst_acc).max())

decile-team / cords Goto Github PK

cords's People

Stargazers

Watchers

Forkers

cords's Issues

Recommend Projects

Recommend Topics

Recommend Org