Code Monkey home page Code Monkey logo

wilds's People

Contributors

b-akshay avatar bearnshaw avatar etiennedavid avatar henrikmarklund avatar keawang avatar kohpangwei avatar michiyasunaga avatar rlphilli avatar ssagawa avatar teetone avatar weihua916 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wilds's Issues

Cannot fetch 'ogb-molpcba' dataset due to missing arg

dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

Results in the following error:

--------------------------------------------------------------------
TypeError                          Traceback (most recent call last)
<ipython-input-2-c369817b9157> in <module>
----> 1 dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/get_dataset.py in get_dataset(dataset, version, **dataset_kwargs)
     51     elif dataset == 'ogb-molpcba':
     52         from wilds.datasets.ogbmolpcba_dataset import OGBPCBADataset
---> 53         return OGBPCBADataset(version=version, **dataset_kwargs)
     54 
     55     elif dataset == 'poverty':

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/datasets/ogbmolpcba_dataset.py in __init__(self, version, root_dir, download, split_scheme)
     88             download_url('https://snap.stanford.edu/ogb/data/misc/ogbg_molpcba/scaffold_group.npy', os.path.join(self.ogb_dataset.root, 'raw'))
     89         self._metadata_array = torch.from_numpy(np.load(metadata_file_path)).reshape(-1,1).long()
---> 90         self._collate = PyGCollater(follow_batch=[])
     91 
     92         self._metric = Evaluator('ogbg-molpcba')

TypeError: __init__() missing 1 required positional argument: 'exclude_keys'

Versions:

wilds 1.1.0
torch_geometric 1.7.0

[Question] Easily accessible pre-trained models

Hi, is there any way to easily access pretrained models for quick evaluation?

For instance something like the following,

|Agorithm | Model | Parameters |
+--------+-------+------------+
| ERM | Resnet50 | Weights50|
| .......
| .......
| ......

n_groups_per_batch does not work for Poverty

Hi WILDS Team,
I am currently working with the WILDS repository. Currently, I work with the poverty dataset. I've run the script run_exp.py in the examples folder with the argument --n_groups_per_batch=3 or a different number. However, per batch, I get samples from more than 3 different groups. Do is use this argument wrongly? I understood the argument --n_groups_per_batch as the number of different environments from which samples exist in one batch.

The command line reads:
python examples/run_expt.py --dataset poverty --algorithm ERM --root_dir data --n_epochs=200 --seed=0 --log_every=200 --batch_size=64 --n_groups_per_batch=2 --progress_bar True

The output when i use the n_groups_variable defined in IRM.py:
n groups: 13
groups: tensor([ 3, 5, 7, 9, 10, 11, 13, 14, 16, 19, 20, 21, 22], device='cuda:0')

In addition, is the command --uniform_over_groups valid for Poverty? Since the samples are not uniformly distributed over the different environments used in the training split?

Thanks in advance for your help.

Niels

pre-trained SwAV model weights for Camelyon17

Hi,

  1. Are pre-trained SwAV model weights for Camelyon17 publicly shared?
    I refer to this file used in the fine-tuning step: "--pretrained_model_path pretrained/checkpoints/ckp-55.pth"

  2. Also, may I know which commands were used for the SwAV-Camelyon17 results in Table 2. of the paper [1]? I can find three sets of commands (camelyon17_swav55_ermaugment_seed, camelyon17_swav55_ermaugment_val_seed, camelyon17_swav55_ermaugment_train_seed) at this link: https://worksheets.codalab.org/worksheets/0xb148346a5e4f4ce9b7cfc35c6dcedd63.

    I am not sure which ones were used as I get slightly different results when I calculate the results using the logs.

Thanks!

[1] Extending the WILDS benchmark for unsupervised adaptation

Figuring out the log files

I am trying to understand the log output.
After running a training command for instance python examples/run_expt.py --dataset camlyon17 --algorithm ERM --root_dir data I will get a log folder with many files. What is the difference between test_algo.csv and test_eval.csv. I have seen that they are related to two loggers:

datasets[split]['eval_logger'] = BatchLogger(
            os.path.join(config.log_dir, f'{split}_eval.csv'), mode=mode, use_wandb=(config.use_wandb and verbose))
datasets[split]['algo_logger'] = BatchLogger(
            os.path.join(config.log_dir, f'{split}_algo.csv'), mode=mode, use_wandb=(config.use_wandb and verbose))

What is the difference between algo and eval ?

What are the random seeds used in the paper

Could you please share what are the random seeds used for the experiments in the paper? I think using the same set of random seeds for our own experiments can offer a fairer comparison of results.

Downloading FMoW dataset

Hello author,
Thanks for the release of the code of your paper. I love your work.

The download is stuck in the middle of the process when implementing "wilds.get_dataset" for downloading the FMoW dataset.
Could you check this one?

Thank you in advance.

How do I access data from only one group?

Hello, Thanks for the fantastic library!

I have two questions:

  1. Is there any way I can get a per-group dataloader in wilds? This will help with, for instance, training a separate model for each group of data.
  2. Can I change the split of data for each dataset? My application requires 50% of the data for each group/domain for testing.

Thanks!

Question about the creation of WILDS-FMoW subset

Hi,

In your paper, you have mentioned that you have used a subset of FMoW. However, in the rgb_metadata.csv file provided, you analyse the entire fmow dataset and I couldn't find where in the code you are creating the subset (sampling from the rgb_metadata.csv file). I have also looked at the parameter frac which was equal to 1.0 in the config file as well as the worksheet (https://worksheets.codalab.org/rest/bundles/0x20182ee424504e4a916fe88c91afd5a2/contents/blob/log.txt). Therefore, I would greatly appreciate it if you could kindly let me know how you created the subset.

Thank you.

Sara A. Al-Emadi

Waterbirds give 0 worst-group accuracy

Training waterbirds dataset out-of-the-box gives 0 worst group accuracy. Digging deeper, I noticed that all the predictions immediately become the 0 label. Any advice would be helpful. Thanks in advance.

Unable to retrieve CodaLab experiment outputs

Hello,

I am trying to download the trained models using the link provided (CodaLab).

When clicking on any of the iWildCAM v2.0 (or any other dataset) experiment results in CodaLab, I get a page with the command line to run (to train the model myself) and loading logo underneath it.

It seems like it is trying to load something, but I had this page open for hours and it still isn't giving me anything. When I click the 'download' button on the left, it leads me to an error page.
Is there a way I can get the results like the best_model.pth that the CodaLab pages describes?

Thank you!

Data loader for PovertyMap is very slow

Hi -

Ran into a bit of an issue with data loading the Povertymap dataset - loading a single minibatch with 128 examples takes about 5-6 seconds. This is not a huge deal but slow enough to make me curious if there's a faster way of doing this.

Digging into the code a bit, it looks like the slowdown is mostly due to the array copy on line 239 of poverty_dataset.py

img = self.imgs[idx].copy()

FWIW it looks like this is a known issue for memory-mapped numpy arrays on Linux systems (https://stackoverflow.com/questions/42864320/numpy-memmap-performance-issues).

I'm not sure if there are any recommendations for getting around this, or if there's another way the data could be loaded in? Or let me know if I'm totally off-base here. Thanks!

Label Description

Thanks for sharing the dataset, I have find the label description in the code.

Error loading the ogb-molpcba dataset

In ogbmolpcba_dataset.py line 96 (and similarly 98), a PyGCollater object is initialized without passing a required positional argument (dataset) which raises an error when calling the get_dataset function for the molecule dataset. I think this can be fixed by replacing with
self._collate = PyGCollater(self.ogb_dataset, follow_batch=[], exclude_keys=[])? Or is this how it is supposed to be and I am missing something?

Model loaded from a .pth predicts only zeros

Hello !

I downloaded for the Camelyon17 dataset your trained model from CodaLab (ERM and seed0). I have installed all packages correctly according to your readme and load the model as follows:

path = "/best_model.pth"
state = torch.load(path)['algorithm']

state_dict = {}
 
for key in list(state.keys()):
    state_dict[key.replace('model.', '')] = state[key]

model.load_state_dict(state_dict)

model.eval()

I initialize the dataset I use for testing the model as follows:

import datasets_load  # from wilds package
dataset = datasets_load.Dataset('camelyon17', 32, '/data', 0.75, False)

For the prediction I used the following piece of code:

from wilds.common.data_loaders import get_eval_loader

test_data = dataset.test_set
test_loader = get_eval_loader('standard', test_data, batch_size=32)

with torch.no_grad():
    for x, y_true, metadata in test_loader:
          y_pred = model(x)
          labels = y_true
          _, predicted = torch.max(y_pred, 1)
          # print statements to check the output
          print("Labels: ", labels)
          print("Predicted: ", predicted)
          print("Correct: ", (predicted == labels).sum().item())

So far so good. When I run the code, the labels are printed (which always consist of 1 at the beginning, because shuffle=False) and the prediction which always consists of 0 values.

Labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Predicted:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Correct:  0

I would appreciate any advice or assistance. Many thanks in advance.
Tim

Dataset Split Size

I noted that the dataset has been updated, e.g. iwildcam. Where can I find the latest information about the dataset split size?

Fail to download ogb-molpcba dataset caused by the version of torch_geometric.

I ran python wilds/wilds/download_datasets.py --root_dir data --datasets ogb-molpcba.
And got an error message like this.

Traceback (most recent call last):
  File "wilds/wilds/download_datasets.py", line 34, in <module>
    main()
  File "wilds/wilds/download_datasets.py", line 27, in main
    wilds.get_dataset(
  File "..../wilds/get_dataset.py", line 52, in get_dataset
    from wilds.datasets.ogbmolpcba_dataset import OGBPCBADataset
  File "..../wilds/datasets/ogbmolpcba_dataset.py", line 7, in <module>
    from torch_geometric.data.dataloader import Collater as PyGCollater
ModuleNotFoundError: No module named 'torch_geometric.data.dataloader'

I found it is caused by torch_geometric changing the module name or moving the function.
I fixed it by changing from torch_geometric.data.dataloader import Collater as PyGCollater to from torch_geometric.loader.dataloader import Collater as PyGCollater. And successfully downloaded the data.

I guess you could check the dependency and solve it. The version of my torch_geometric is 2.0.2.

BTW This benchmark is very useful. It would be nice to have a TensorFlow version. I am looking forward to it.

Could you provide the trained weights?

Hello,

I am training BERT+ERM on the Amazon dataset but it is very time cost. Is it possible to provide the best trained parameters to the users? ( like BERT is proving the pretrained weights, maybe you can have another folder under examples which contains all the weights for users.) It will save users about a week ( and computations). Thank you!

Map for adding cross validation training and evaluation

Hello and thank you for this amazing package.

Instead of using replicates, I would be interested in adding a cross validation training and evaluation scheme based on the domain metadata.

Say a dataset has domain: A,B,C. I would like to:

  • train on 70% of data sampled from A,B and evaluate in distribution on the remaining 30 % from A,B and out of distribution on C.
  • train on 70% of data sampled from B,C and evaluate in distribution on the remaining 30 % from B,C and out of distribution on A.
  • train on 70% of data sampled from C,A and evaluate in distribution on the remaining 30 % from C,A and out of distribution on B.

Finally average the in distribution and the out of distribution metric to have the final performance.

Here the 70-30 split is arbitrary and should be modifiable.

I am just starting exploring the package having only replicated the ERM result on the camelyon17 dataset.

It seems that the grouper object might be a good start to implement the following procedure. But, I am still lacking a high level overview of the code. So how would you do this ?

camelyon17 split scheme: in-dist

I am not able to run Camelyon17 with --split_scheme in-dist (I'm assuming this corresponds to the setting with ID val data).

Any pointers on how to run this, or in general how to run camelyon with the ID val data?

Thank you for the help!

Unable to Train ERM model with civilcomments

Hi,

I am having trouble in running the code with command
python3 wilds/examples/run_expt.py --dataset civilcomments --algorithm ERM --root_dir data --download
Everything stuck, no error reported, both GPU and CPU are not leveraged.

If ctrl+C, it shows
image

The same thing didn't happen when I tried to run the same script but with groupDRO.

It would be very helpful if you have any clue on this, and thank you a lot for your amazing, well developed code!

Understanding the prediction_dir format for leaderboard submission

I wonder if the log folder used during training is the prediction_dir described in Get Started: Evaluating trained models.

I tried to reproduce the ERM result on a subset of camelyon with the following command:

python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --frac 0.1 --log_dir log_erm_01.

Training goes well.

But my file camelyon17_split:id_val_seed:0_epoch is empty.

Then I ran the following command:
python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

And I got this:

Traceback (most recent call last):
  File "examples/evaluate.py", line 282, in <module>
    main()
  File "examples/evaluate.py", line 244, in main
    evaluate_benchmark(
  File "examples/evaluate.py", line 136, in evaluate_benchmark
    predictions_file = get_prediction_file(
  File "examples/evaluate.py", line 89, in get_prediction_file
    raise FileNotFoundError(
FileNotFoundError: Could not find CSV or pth prediction file that starts with camelyon17_split:id_val_seed:0.

So my question is whether the log file is the prediction_dir described in Get Started ?

Support for faster model training

May I check if there's any plan to support: 1) multi-gpu parallel training; 2) fp16; 3) gradient accumulation. These functions would allow us to train models much faster and with larger batch size (especially for large models like BERT).

`assert` error in new wilds version with FMoW

Hello, I am using the new version of WILDS and getting the error:

... wilds/common/utils.py" line 86, in avg_over_groups
    assert v.numel()==g.numel()

any ideas? It may be a bug on my end and if I catch it I'll update here.

Calculation of OOD within the paper

Hello together,

first of all I would like to thanks for making the paper and code to "WILDS: A Benchmark of in-the-Wild Distribution Shifts" publicly available. What caught my interest when reading the paper was the estimation of the in (IID) and out-of-distribution (OOD) which has been evaluated using empirical risk minimization (Table 1 page 20). My question is how the IIDs and OODs were calculated. Did you use the softmax with temperature scaling according to the paper "Enhancing the reliability of out of distribution image detection in neural networks"? If not, can you give reference to the way you tackled this problem?

Thank you in advance for your kindful reply

Error with example fMOW command: incorrect value of "unlabeled_n_groups_per_batch"

Hello,
If I directly run this command suggested in the README:
python examples/run_expt.py --dataset fmow --algorithm DANN --unlabeled_split test_unlabeled --root_dir data

I get the following exeption:

Traceback (most recent call last):
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/examples/run_expt.py", line 491, in <module>
    main()
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/examples/run_expt.py", line 454, in main
    train(
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/examples/train.py", line 114, in train
    run_epoch(algorithm, datasets['train'], general_logger, epoch, config, train=True, unlabeled_dataset=unlabeled_dataset)
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/examples/train.py", line 38, in run_epoch
    unlabeled_data_iterator = InfiniteDataIterator(unlabeled_dataset['loader'])
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/examples/utils.py", line 393, in __init__
    self.iter = iter(self.data_loader)
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1085, in __init__
    self._reset(loader, first_iter=True)
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1118, in _reset
    self._try_put_index()
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1352, in _try_put_index
    index = self._next_index()
  File "/home/fs01/jyf6/miniconda3/envs/ponds/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 624, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/mnt/beegfs/bulk/mirror/jyf6/datasets/wilds/wilds/common/data_loaders.py", line 131, in __iter__
    groups_for_batch = np.random.choice(
  File "mtrand.pyx", line 984, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

I think this occurs because there are only 2 unique years in the test_unlabeled split, but unlabeled_n_groups_per_batch is set to 8, so it tries to sample 8 years without replacement.

I was able to fix this by changing the argument unlabeled_n_groups_per_batch to 2, here: https://github.com/p-lambda/wilds/blob/main/examples/configs/datasets.py#L220

It would be great if this can be fixed. Thank you so much for releasing these wonderful datasets and baseline algorithms!

Oracle results for UDA tasks

Hi,

Could you please share Oracle, i.e. training on labeled target domain, results for the Camelyon17 and iWildCAM datasets in "Extending the WILDS benchmark for unsupervised adaptation"? Oracle results for the other datasets would be appreciated too.

If Oracle results are not available, could you please share the commands that can be used to obtain them?

Thanks.

Issue in OOD data distribution when Grouper is set to "regions" for FMoW

Hi,

I am trying to change the groupby from "year" to "region". I have followed the instructions in the README page and currently using the following command:
python3 wilds/examples/run_expt.py --dataset fmow --algorithm ERM --groupby_fields region --root_dir wilds_fmow/

However, the issue is that the training dataset is not being separated in terms of distinct regions for ID and OOD manner. That is, all regions are included in ID as well as OOD. Here is a screenshot of the output:
Screenshot 2022-11-09 at 15 37 47

Therefore, I was wondering if that is a bug in the code or am I missing something?

Thanks
Sara A. Al-Emadi

Obtaining (full) model predictions for trained models

Hello and thank you for all your work on this important project!

I am wondering if it's possible to share the full predictions of the trained models on the val/test data.

If I understand correctly, this is similar to the output files
{dataset}_split:{split}_seed:{seed}_epoch:{epoch}.csv (e.g. this file for Rxrx1), but where the information in the csv is not only the argmax class prediction, but the entire logits vectors (e.g. in this case a vector in R^1139).

I think this would be useful as it will allow people (myself included) to evaluate trained models using a variety of custom metrics, but without actually downloading the data and doing the evaluation (which could be prohibitive for some of the larger datasets).

Thanks!
Gal

fmow and Pandas 2.0.0 datetime conversion

I'm getting an error when initializing the "fmow" dataset. I got the following error for the conversion of the timestamp to datetime with Pandas:

ValueError: time data "2011-02-07T02:48:56.643Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 92. You might want to try:
- passing format if your strings have a consistent format;
- passing format='ISO8601' if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing format='mixed', and the format will be inferred for each element individually. You might want to use dayfirst alongside this.

I noticed I was using Pandas 2.0.0 (presumably the most recent version) and when I reverted to Pandas 1.5.3, the issue seemed to go away. I'm guessing the datetime formatting was changed in version 2 and it might be good to update WILDS to still work with the new version. Thanks!

run_expt.py: --device argument doesn't set the device

Hey, I'm running Wilds on a p2.8xlarge AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py and use the --device argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi as well as printing the device used by torch using torch.cuda.current_device(). My guess is that the CUDA_VISIBLE_DEVICES environment variable, set here, is set too late and PyTorch just defaults to device 0.

I've worked around this by setting the CUDA_VISIBLE_DEVICES variable manually, before running the script. I just thought I'd let you know I encountered this issue.

Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt pretty easily to run my own experiments.

Installating via pip seems to miss `torch_scatter` dependency

Hey,

I noticed that the installation via pip install wilds seems to miss the torch_scatter dependency that is also listed in the README. When e.g. trying to do from wilds.datasets.amazon_dataset import AmazonDataset I got

from wilds.datasets.amazon_dataset import AmazonDataset
  File "/Users/deul/Desktop/wilds/wilds/datasets/amazon_dataset.py", line 6, in <module>
    from wilds.common.utils import map_to_id_array
  File "/Users/deul/Desktop/wilds/wilds/common/utils.py", line 1, in <module>
    import torch, torch_scatter
ModuleNotFoundError: No module named 'torch_scatter'

As far as I can see, the solution should be as easy as adding torch_scatter>=2.0.5 to the install_requires attribute in setup.py. In my case, the error was resolved after installing torch_scatter separately.

ModuleNotFoundError: No module named 'transformers'

Hello, in your several files in examples (e.g. optimizer.py and transforms.py), you tried to import some functions from transformers but it is not provided in the current version. Could you please upload the module file? Thanks!

releasing smaller subsets of the datasets

Hi, thanks for releasing the benchmark and the datasets. I was wondering if it would be possible to release smaller subsets of the datasets (e.g. similar size to cifar, mnist, etc) in order to allow for rapid prototyping? As it stands currently it takes more than 2 days just to download one dataset which could occupy the majority of disk space as well. This alone could prohibit people from trying out and exploring the datasets.

Also, it would be nice if you could put on the info page here on github what is the download size of each dataset and what is the actual size on disk.

Thanks!

algorithm.eval() vs. algorithm.model.eval()

Hi,

Really nice job with this repo! I had a small comment on the use of algorithm.eval() vs. algorithm.model.eval() in the wilds/examples/train.py file that might be useful to others.

I wasn't able to find this in the code, but how does algorithm.eval() differ from algorithm.model.eval()?

I ask because algorithm.model.eval() preserves the grad_fn attribute on the model output, while algorithm.eval() does not. This was unexpected behavior since pytorch's .eval() function doesn't do this. This is important for my use case, since I'm trying to evaluate the gradients when the model is in eval mode. If this does not break behavior elsewhere, I'd suggest switching to algorithm.model.eval().

Happy to explain more if this was confusing!

Poverty Map: Unable to map the image_id to its corresponding wealth_pooled and country domain from the dhs_meta.csv file.

Hi, In the train_mixup(train_loader, epoch, agg) function, for ith sample of a batch, I got image id= 5863, domain = 13 with wealth_pooled= -0.8209. Upon looking into the dhs_meta.csv file, I figured out the wealth_pool value and country domain, however the corresponding image_id is not 5863. Would you please help me how can I map the image_id to the corresponding country and wealth_pooled. Thanks.

The Waterbirds dataset's link is invalid

Hi,

The Waterbirds dataset with UUID: '0x505056d5cdea4e4eaa0e242cbfe2daa4" on CodaLab is invalid right now with Error: 404 (cannot manually download it from the link or the page). It would be greatly appreciated if you could kindly fix it.

Thank you very much!

Replicating Civil Comments results with standard deviation with Group DRO (label ) algorithm

Dear Team,

I am trying to replicate results for the civil comments dataset using GROUP DRO - LABEL (i.e group by = 'Y') in the leaderboard .
Test average accuracy is mentioned as 90.2 (0.3) and validation average accuracy as 90.4 (0.4) ,I am not understanding on how these values were obtained, does that mean from each seed max of average accuracy among 5 epochs are used, or do we use only the average accuracy from the last epoch(i.e. 5th epoch)

I tried taking the average from all 5 seeds by using the 5th epoch average accuracy, i didn't not achieve 90.2 in the test average accuracy.( I used the average value from the test.eval.csv of all the 5 seeds published in the notebook)

Could you please help me with how to replicate the results including the standard deviation?

"Corrupt" image in dataset iWildCam version 2.0

Hi and thanks for sharing the code.

I found an issue when training on dataset iWildCam version 2.0. The problem doesn't exist in iWildCam v1.0.

I think that there is at least 1 corrupt image in iWildCam v2.0. When I train on this dataset, the training breaks because it cannot open the image.

The corrupt image is: /iwildcam_v2.0/train/8ad9843e-21bc-11ea-a13a-137349068a90.jpg
I also tried to open this image with Image Viewer on Ubuntu and it didn't work.

Other people on the internet encountered the problem as well:
https://www.kaggle.com/c/iwildcam-2020-fgvc7/discussion/134923

I'm not exactly sure what is the best solution here, but maybe if you would temporarily make "v1.0" the default it might decrease the people that stumble in it.

Thanks,
George

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.