quark0 / darts Goto Github PK

Differentiable architecture search for convolutional and recurrent networks

Home Page: https://arxiv.org/abs/1806.09055

License: Apache License 2.0

Python 100.00%

deep-learning automl image-classification language-modeling pytorch convolutional-networks recurrent-networks neural-architecture-search

darts's People

Contributors

Stargazers

Watchers

Forkers

codeaudit andres-root locussam alvations bkj cclauss lentilcurry waterponey eglxiang reiisky uwroute templeblock mayankskb dailyactie wxb506 atodniar codes-kzhan davincibj lanastazia ascenoputing leckie-chn ssghost nehz gzpan cndylan shadowkun snazz2001 asacooperstickland leekltw shiyongde rishab-sharma shu13720902 fangwudi mafm qyhboy shaunstanislauslau sagarchaturvedi1 lichengxiao2017 morganjk eccstartup ml-lab nethask snowfeet gavinzjchao ssh-shashi hyzcn kormilitzin fssqawj adas1994 informatrix mikkohypponen tony32769 hefv57 cequencer danielkrupinski fahad emigmo malizheng shenggaozhu suhoy901 chicm-ms hussain7 eywalker jbdatascience johndpope hoaxoan briando2005 jalajthanaki scapeqin galvinw tatsuyashirakawa countif hanhanzhai michaelyq xiaomi2008 uniqueness smith478 jiajiemo wujian16 autoshift renqianluo aptxj alexmikhalev mtcrawshaw ai-how openhero ricklentz tiagoooliveira tremblerz jimgoo leoyml xujinfan guiyudaniel hbcbh1999 dantodor richgit101 morristech superrookie007 udacitysimon yunwenhuang

darts's Issues

loading data error

Hi,

When I run the code on CIFAR10, CUDA_VISIBLE_DEVICES=0 python train_search.py --unrolled,
there is strange error which is
07/31 05:43:39 PM train 050 1.986851e+00 26.960785 81.188728 07/31 05:52:04 PM train 100 1.871319e+00 29.965965 84.189354 07/31 06:00:27 PM train 150 1.797943e+00 32.926323 85.844368 Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fc27522edd8>> Traceback (most recent call last): File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in __del__ self._shutdown_workers() File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers self.worker_result_queue.get() File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/multiprocessing/queues.py", line 337, in get return _ForkingPickler.loads(res) File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle return recvfds(s, 1)[0] File "/data1/yinzheng/anaconda2/envs/py36/lib/python3.6/multiprocessing/reduction.py", line 153, in recvfds msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size)) ConnectionResetError: [Errno 104] Connection reset by peer
Can you tell me what is wrong with this error?
Thanks very much

Valid_Queue Size = 50% of Train by default

Any thought or intuition as to why we split the train data in half as opposed to the more common practice of [15-30] percent for validation split. Does the sampling have to come from the train set or can it come from a different distribution of validation data like on a standard train, validation, test split?

Feel free to close after answering.
Regards.

How to derive the architecture? Running the train_search on cifar10, but the architecture is different

Hi
I have 2 questions about how to derive the final architecture:

I use the default search script on cifar10 python train_search.py --unrolled --seed 0 to search the architecture. I found the architecture is different with the one provided by the paper (also in the genotypes.py of the repo). If I change the seed, the architecture will also change. In the paper, the authors mentioned the results are obtained by 4 runs. So my questions: Do the 4 runs use the same architecture? Or use 4 different architectures?, and How the illustrated architecture in the paper is selected?
On my own run to search the architecture, I found the probability of the zero op is the highest. However, in the paper, the authors mentioned that zero op is not used in the final architecture (Sec.2.4). This is also confirmed in the code. My question is If zero op is not used, why we add a zeros in the searching space? It is really weird since if we do not excluded the zero op, all the ops will be zero ;-(. Does the author have the same problems? For example, the alphas for normal cell is

[[0.1838, 0.0982, 0.081 , 0.1736, 0.1812, 0.0846, 0.091 , 0.1066],
[0.4717, 0.0458, 0.0496, 0.0945, 0.1113, 0.0556, 0.0953, 0.0762],
[0.2946, 0.1425, 0.0855, 0.1768, 0.0837, 0.0735, 0.0731, 0.0704],
[0.3991, 0.0631, 0.0581, 0.1053, 0.1307, 0.0577, 0.1043, 0.0817],
[0.6298, 0.0382, 0.035 , 0.0658, 0.0435, 0.0551, 0.0605, 0.0721],
[0.3526, 0.0974, 0.0693, 0.1346, 0.1245, 0.0697, 0.091 , 0.061 ],
[0.4829, 0.06 , 0.0612, 0.115 , 0.0969, 0.065 , 0.0624, 0.0565],
[0.6591, 0.0303, 0.0282, 0.0558, 0.0578, 0.054 , 0.0581, 0.0568],
[0.7612, 0.0199, 0.0207, 0.0294, 0.0343, 0.0442, 0.0431, 0.0472],
[0.3519, 0.1231, 0.0692, 0.1381, 0.0925, 0.076 , 0.0748, 0.0744],
[0.4767, 0.0781, 0.0679, 0.1216, 0.0679, 0.0701, 0.0548, 0.0629],
[0.6769, 0.032 , 0.0292, 0.0547, 0.0533, 0.0427, 0.0614, 0.0498],
[0.7918, 0.0191, 0.0199, 0.0279, 0.0423, 0.0223, 0.0392, 0.0375],
[0.8325, 0.0153, 0.0158, 0.0199, 0.0284, 0.0255, 0.0313, 0.0313]]

Each row is the probability of ['none', 'max_pool_3x3','avg_pool_3x3','skip_connect','sep_conv_3x3','sep_conv_5x5','dil_conv_3x3','dil_conv_5x5'] for each edge.

Models trained using code in table 3

In table 1, you have results for running NASNet/AmoebaNet models using your code (marked with +). Did you do similar experiments for Table 3 (Imagenet)?

If not, is running train_imagenet.py with --arch=AmoebaNet enough to get me ~26/27% error as reported?

Thanks :)

about _compute_unrolled_model

Hello,

Thanks for your sharing. Could I ask a question about _compute_unrolled_model. What does unroll means here? Why did you code this function? It seem a backward-like function?

thanks.

after 2hours training stopped by CUDA error: output of memory

File "train_imagenet.py", line 152, in main
valid_acc_top1, valid_acc_top5, valid_obj = infer(valid_queue, model, criterion)
File "train_imagenet.py", line 214, in infer
logits, _ = model(input)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/ziyan.wyq/darts/cnn/model.py", line 207, in forward
s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/ziyan.wyq/darts/cnn/model.py", line 52, in forward
h2 = op2(h2)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/ziyan.wyq/darts/cnn/operations.py", line 66, in forward
return self.op(x)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

hi I tried to train imagenet by
python train_imagenet.py --data $train_data_set --epochs 50000 --auxiliary
after 2 hours training I get a cuda error
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/activation.py", line 46, in forward
return F.threshold(input, self.threshold, self.value, self.inplace)
File "/usr/lib64/python2.7/site-packages/torch/nn/functional.py", line 625, in threshold
return torch._C._nn.threshold(input, threshold, value)
RuntimeError: CUDA error: out of memory

Differences between `model.py` and `model_search.py`

I noticed a couple of discrepencies between cnn/model.py and cnn/model_search.py:

https://github.com/quark0/darts/blob/master/cnn/model_search.py#L18 - extra batchnorm
https://github.com/quark0/darts/blob/master/cnn/model_search.py#L32 - affine=False in stem
no auxiliary head in model_search
no drop_path in model_search

Are there particular reasons for these choices (particularly the first two)? Or are they bugs/arbitrary decisions?

~ Ben

EDIT: Same question applies to different choices in hyperparameters in the search vs. final training (batch size, init_channels, etc)

drop_path_prob = 0.2 or 0.3?

Hi,

In the paper, it said that "path dropout of probability 0.3", but in the cnn/train.py the default value of drop_path_prob is 0.2. May I ask whether the experiments in the paper use 0.2 or 0.3? And does it affect the results a lot?

Best Regards,

When PyTorch == 0.3.1, torchvision cannot be >= 0.2.1

$ conda create -n darts -c pytorch python=3.5.5 pytorch=0.3.1 torchvision=0.2.1
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - pytorch=0.3.1
  - torchvision=0.2.1 -> pytorch[version='>=0.4']
Use "conda info <package>" to see the dependencies for each package.

Question about architecture derivation

First, thank you for sharing this great work!

I have something unclear about choosing top-2 strongest edges for each node in the CNN case.
From the paper and the code, my understanding of this selection is to take two edges whose highest non-zero operation probabilities are larger than than other edges'.

But, in your comment (#32 (comment)), the selection seems to depend on "sum of non-zero operations' probabilities" rather than "the highest among non-zero operations".

For example, if we assume we have one edge from the following two, which should be chosen?
(To me, it seems the paper & code selects (1) while the github comment tells (2).)

(1) {zero: 0.4, conv: 0.5, pool: 0.1}
(2) {zero: 0.3, conv: 0.35, pool: 0.35}

Thanks.

trained models

just curious if you might consider releasing weights for the trained imagenet model.

pytorch 0.4 compatibility

TODO

sharing batchnorm modules across nodes inside a rnn cell?

It is noticed that the batchnorm layer in DARTCellSearch is initialized once(

darts/rnn/model_search.py

Line 14 in d922492

self.bn = nn.BatchNorm1d(nhid, affine=False)

) but used multiple times (

darts/rnn/model_search.py

Line 18 in d922492

s0 = self.bn(s0)

darts/rnn/model_search.py

Line 39 in d922492

s = self.bn(s)

)
so what's the point of it? Will there be any difference if I don't?

Potential discrepancy between code and equation

Thanks for proposing this amazing work!
In cnn/architect.py:60, you write
implicit_grads = self._hessian_vector_product(model_unrolled, vector, input_train, target_train)
Here the parameters of model_unrolled, if I understand correctly, correspond to w' in paper. And this line calculates equation 7 in paper. In function _hessian_vector_product, you seem to add vector directly to model_unrolled parameters, which in equation means w+ = w' + updates. But in equation 7, w+ equals updates applied to original w : w+ = w + updates.
May I ask is this intended behaviour or did I understand incorrectly?

BatchNorm after pooling layer?

Hi,

During training, you add one batchnorm layer after the pooling layer, as https://github.com/quark0/darts/blob/master/cnn/model_search.py#L18. However, when testing the searched architecture, there is no batchnorm after pooling layer. Would you mind to let me know the reason?

Best Regards,

input_search and target_search are unchanged during iteration

darts/cnn/train_search.py

Line 147 in b6d4fe1

input_search, target_search = next(iter(search_queue))

Hi, thank you for releasing code! nice work!
But I discover something I can't understand. In function train in train_search.py, input_search and target_search are unchanged during iteration. They are always the first batch in search_queue.

Is it a small bug or you do it by purpose?

Training results of `train_search.py`?

What's the recommended way to train the results of train_search.py? This is the end of my log.txt:

...
2018-06-27 13:25:46,378 epoch 49 lr 1.023679e-03
2018-06-27 13:25:46,379 genotype = Genotype(normal=[('skip_connect', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 2), ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 0), ('skip_connect', 1)], normal_concat=range(2, 6), reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2), ('max_pool_3x3', 0), ('dil_conv_5x5', 3), ('skip_connect', 2), ('skip_connect', 3)], reduce_concat=range(2, 6))
...2018-06-27 13:54:15,198 train_acc 99.704000
...
2018-06-27 13:54:45,268 valid_acc 88.760000

Should I just copy and paste that Genotype into genotypes.py w/ a new name? Or is there some recommended way?

Thanks

Poor PTB test performance?

Hi Hanxiao,

I train this model https://github.com/quark0/darts/blob/master/rnn/genotypes.py#L33 on PTB. I obtain the similar validation performance (val ppl = 59.0), but the test ppl (61.3) is much higher than the reported results (55.7). Is there any suggestion?

Thanks

Less search cost

Hi, I run the first order cnn search with a single GPU(Titan X). It costs only 11 hours, much less than 1.5 GPU days reported in the paper. Amazing....... The log is shown bellow.
By the way, the result architectures are different in different runs. How did you pick the architecture reported in the paper?

Thank you in advance.
Yukang

07/08 05:51:51 AM train 250 1.791708e-02 99.670070 100.000000
07/08 05:53:27 AM train 300 1.792845e-02 99.667774 100.000000
07/08 05:55:03 AM train 350 1.800602e-02 99.675036 100.000000
07/08 05:56:20 AM train_acc 99.684000
07/08 05:56:20 AM valid 000 4.737676e-01 90.625000 98.437500
07/08 05:56:31 AM valid 050 4.070231e-01 89.430147 99.264706
07/08 05:56:41 AM valid 100 4.179213e-01 89.279084 99.443069
07/08 05:56:51 AM valid 150 4.126024e-01 89.093543 99.503311
07/08 05:56:53 AM valid_acc 89.110000
07/08 05:56:53 AM epoch 49 lr 1.023679e-03
07/08 05:56:53 AM genotype = Genotype(normal=[('skip_connect', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('skip_connect', 0), ('skip_connect', 1)], normal_concat=[2, 3, 4, 5], reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 3), ('max_pool_3x3', 1), ('skip_connect', 2), ('max_pool_3x3', 0)], reduce_concat=[2, 3, 4, 5])
07/08 05:56:55 AM train 000 7.202163e-03 100.000000 100.000000
07/08 05:58:30 AM train 050 1.960013e-02 99.632353 100.000000
07/08 06:00:05 AM train 100 1.947881e-02 99.659653 100.000000
07/08 06:01:40 AM train 150 1.897141e-02 99.679222 100.000000
07/08 06:03:15 AM train 200 1.902238e-02 99.681281 100.000000
07/08 06:04:51 AM train 250 1.879712e-02 99.688745 100.000000
07/08 06:06:26 AM train 300 1.801273e-02 99.719684 100.000000
07/08 06:08:00 AM train 350 1.759578e-02 99.724003 100.000000
07/08 06:09:17 AM train_acc 99.720000
07/08 06:09:17 AM valid 000 4.404125e-01 92.187500 98.437500
07/08 06:09:28 AM valid 050 4.015100e-01 89.552696 99.356618
07/08 06:09:38 AM valid 100 4.156907e-01 89.217203 99.489480
07/08 06:09:50 AM valid 150 4.085849e-01 89.155629 99.596440
07/08 06:09:51 AM valid_acc 89.200000

Is it ok to copy the 49th epoch's genotype?

In the last epoch of search, it seems the architecture still have chances to change during training, however, only the arch at each start of epoch is shown. So is it right to copy and use the genotype just after the 49th epoch? Or should we add print at the end of the search and use that genotype?

train_search on multi-gpus

Hello, quark!
Thx for your great work. When I tried to run your train_search job with multi-gpus, the Variable of alphas_normal and alphas_reduce causes errors.
The errors are shown as following:

File "/mnt/data-3/data/jiemin.fang/anaconda3/envs/pytorch4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/mnt/data-3/data/jiemin.fang/darts-maml/cnn/model_search.py", line 111, in forward s0, s1 = s1, cell(s0, s1, weights) File "/mnt/data-3/data/jiemin.fang/anaconda3/envs/pytorch4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/mnt/data-3/data/jiemin.fang/darts-maml/cnn/model_search.py", line 54, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/mnt/data-3/data/jiemin.fang/darts-maml/cnn/model_search.py", line 54, in <genexpr> s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/mnt/data-3/data/jiemin.fang/anaconda3/envs/pytorch4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/mnt/data-3/data/jiemin.fang/darts-maml/cnn/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops)) File "/mnt/data-3/data/jiemin.fang/darts-maml/cnn/model_search.py", line 22, in <genexpr> return sum(w * op(x) for w, op in zip(weights, self._ops)) RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:314
For debugging the code, I tried to remove 'w' which is from alphas_normal or alphas_reduce in
return sum(w * op(x) for w, op in zip(weights, self._ops))
Both 0.3 and 0.4 version of PyTorch have been tried, but the problem got no improvement.
Could you please tell me how I can deal with the multi-gpu training work? And have you ever met any similar problem like this?
Best and waiting for your reply!

Validation phase code

The code doesn't take argmax of the operations /best two of the edges in the inference phase of the script https://github.com/quark0/darts/blob/master/cnn/train_search.py
How does this correspond to the actual inference? There is a possibility that the performance might degrade significantly when you discretize the architecture.

FC/Dense and MLP capabilities

How can we:

Apply this to plain MLP's?
Handle the addition of fully-connected layers (eg. after a CNN)?
Use for single or multivariable regression?

how about stem_multiplier do?

dear author,i'm very fascinated with your approach,But i'm a beginer on pytorch&architecture search.
when i am reading your source code, then come up with a question.
that is how about stem_mutiplier&mutiplier do? i mean in the 63th line of your model_search.py file.
Is these things talked in your paper or another papar?
Can you give me any suggestion?

Some issues about the paper

Hi Hanxiao,

The table 1 in the paper makes me confused. There are three issues:
1 . AmoebaNet-A with 3.34 ± 0.06 test error and 3.2 M params in the original paper is trained without cutout.
2. The search cost for NASNet-A is different in the first an the second line (1800 vs 3150). I refer to the latest original paper, it is 2000 GPU days, 4 days with 500 GPUS.
3. In my view, using GPU days (number of GPUs x days) as metrics is not fair. Because, the running speed on two GPUs is less than twice as the speed on one GPU. In the other word, running on two GPUs across 1 day are not simply same to running one GPU across 2 days, although they are both 2 GPU days.

Best
Yukang

How to save and load pytorch model

Hey can you help me with saving the final model which can be load directly on a system without requiring other files.
Since the model is only saving the weights i.e. model.state_dict() so i have updated that line for saving the complete model not only the state_dict but still to load the model it requires all other custom py files like model_search , architecture and genotypes.

Multi-gpu search available？

RuntimeError: cuda runtime error (2) : out of memory at /tmp/pip-ba6igt7v-build/aten/src/THC/generic/THCStorage.cu:58

Hello,
I am currently in Nivida GP104, Cuda compiler driver. Python 3.6.5, pytorch 0.3.1.

When I was trying to run train.py and train_search.py. It gave the error with:

RuntimeError: cuda runtime error (2) : out of memory at /tmp/pip-ba6igt7v-build/aten/src/THC/generic/THCStorage.cu:58

Any solutions for this error?

Thankyou

Only apply softmax to alpha on operations leads to state explosion.

Hi,

I'm currently trying to reproduce your experiment in tensorflow. What I've found during the implementation & debugging is that, your current implementation of continuous relaxation (formula 2 in your paper) leads to state explosion. You just apply softmax on different operators, not different previous states, which will make the last state value much larger than the initial one.

For example, suppose s0 = 1, and every mixed operator is somehow a identity operator. Then s1 will be 1, s2=s0 + s1=2, s3=s0+s1+s2=4 ... finally the value of the states will grow exponentially. What's worse, the very large hidden state will feed to the next timestep/batch as initial state, which leads to infinite large state value.

But anyway, your code worked. I think the key is to shrink the state value each time it go through the gates or cells. I thought the secret sauce is shared batchnorm, but it failed.

So could you please explained for me?
Thanks!

Out of memory error when training best model on imagenet

I am using V100 gpu which has 16G memory. Here is the error log-

07/10 07:05:24 PM valid 000 2.609589e+00 47.656250 76.562500
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train_imagenet.py", line 230, in <module>
    main() 
  File "train_imagenet.py", line 152, in main
    valid_acc_top1, valid_acc_top5, valid_obj = infer(valid_queue, model, criterion)
  File "train_imagenet.py", line 214, in infer
    logits, _ = model(input)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/model.py", line 207, in forward
    s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/model.py", line 51, in forward
    h1 = op1(h1)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/darts/cnn/operations.py", line 66, in forward
    return self.op(x)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/.torch-env/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

OOM ERROR WITH PYTORCH 0.3.1

I see open issues relating to this but all with PyTorch 0.4

Does anyone have experience with this in PyTorch 0.3.1?

New PyTorch user so any assistance is much appreciated.

Thanks!!

Experiment dir : eval-EXP-20180717-093523 07/17 09:35:23 AM gpu device = 0 07/17 09:35:23 AM args = Namespace(arch='DARTS', auxiliary=False, auxiliary_weight=0.4, batch_size=96, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=600, gpu=0, grad_clip=5, init_channels=36, layers=20, learning_rate=0.025, model_path='saved_models', momentum=0.9, report_freq=50, save='eval-EXP-20180717-093523', seed=0, weight_decay=0.0003) 108 108 36 108 144 36 144 144 36 144 144 36 144 144 36 144 144 36 144 144 72 144 288 72 288 288 72 288 288 72 288 288 72 288 288 72 288 288 72 288 288 144 288 576 144 576 576 144 576 576 144 576 576 144 576 576 144 576 576 144 07/17 09:35:25 AM param size = 3.169414MB Files already downloaded and verified Files already downloaded and verified 07/17 09:35:27 AM epoch 0 lr 2.500000e-02 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518241081361/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train.py", line 169, in <module> main() File "train.py", line 102, in main train_acc, train_obj = train(train_queue, model, criterion, optimizer) File "train.py", line 122, in train logits, logits_aux = model(input) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/mnt/SSD/MARK/darts/cnn/model.py", line 148, in forward s0 = s1 = self.stem(input) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 282, in forward self.padding, self.dilation, self.groups) File "/home/gecko/MARK/anaconda3/envs/darts/lib/python3.5/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518241081361/work/torch/lib/THC/generic/THCStorage.cu:58

The difference between training and testing?

Hi,

It seems that during training, the cell uses a weighted sum over all edges to generate features from one node to another node, as https://github.com/quark0/darts/blob/master/cnn/model_search.py#L21. However, during testing, you select 2 edges with the maximum values to sum, as https://github.com/quark0/darts/blob/master/cnn/model_search.py#L142. May I ask, why you choose 2 edges rather than another number?

Best Regards,

Train_imagenet setting

Hi, I just wanted to verify the imagenet training arguments you guys used since the defaults seem a bit strange. Do you run it with python train_imagenet.py --auxiliary? 1000 epochs seems a bit large. And with the learning rate decay, the end lr is 5.91199783e-14

Draw Genotype Cell

Hello

Is there a function or example you can provide on how to draw the Cell Block (similar to documentation GIF) give a Genotype and alphas values.

Regards.

Multi-label Classification?

Works with single label classification, however does this support images with multi-label classification, if so could you help me with the classifying my data set, with images more than 60 categories?

The code to calculate the multiply-add operations?

In section 3.4.1, it is said that " the number of multiply-add operations in the model is restricted to be less than 600M". Would you mind to provide the code to calculate the number of multiply-add operations?

Best Regards,

Bootstrapping Bug

If data/ doesn't already have cigar-10, the existence of a symlink can or will suppress the untarring of the otherwise downloaded data.

Why FactorizedReduce use two convolutional layers?

Hi,
I saw that https://github.com/quark0/darts/blob/master/cnn/operations.py#L90 use two convolutional layers. May I know why use this structure instead of one convolutional layer with the output dimension twice?

Best Regards,

RNN PTB model search converges to local optimum with a high perplexity number

Hi,

I'm trying to reproduce the RNN arch search over PTB. I run train_search.py with default settings, but got high perplexity number, and seems that the arch search converges to a local optimum at a early stage.

So is it just bad luck(try different rand seeds), or the arch needs to be re-evaluated from scratch, or there's something unnoticed?
log.txt

Out of memory trying to run CIFAR example

I am tring to run CIFAR example using pyro docker image, but I have cuda out of memory error:

pyromancer@6d7a480a66c9:~/workspace/shared/darts/cnn$ python train_search.py --unrolled Experiment dir : search-EXP-20180712-214257 07/12 09:42:57 PM gpu device = 0 07/12 09:42:57 PM args = Namespace(arch_learning_rate=0.0003, arch_weight_decay=0.001, batch_size=64, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.3, epochs=50, gpu=0, grad_clip=5, init_channels=16, layers=8, learning_rate=0.025, learning_rate_min=0.001, model_path='saved_models', momentum=0.9, report_freq=50, save='search-EXP-20180712-214257', seed=2, train_portion=0.5, unrolled=True, weight_decay=0.0003) 07/12 09:43:00 PM param size = 1.930618MB Files already downloaded and verified Files already downloaded and verified 07/12 09:43:01 PM epoch 0 lr 2.500000e-02 07/12 09:43:01 PM genotype = Genotype(normal=[('avg_pool_3x3', 0), ('dil_conv_5x5', 1), ('dil_conv_3x3', 1), ('dil_conv_5x5', 2), ('max_pool_3x3', 1), ('avg_pool_3x3', 0), ('dil_conv_5x5', 1), ('avg_pool_3x3', 0)], normal_concat=range(2, 6), reduce=[('avg_pool_3x3', 1), ('avg_pool_3x3', 0), ('sep_conv_3x3', 1), ('dil_conv_5x5', 2), ('sep_conv_3x3', 2), ('avg_pool_3x3', 3), ('max_pool_3x3', 4), ('dil_conv_5x5', 0)], reduce_concat=range(2, 6)) tensor([[ 0.1249, 0.1249, 0.1252, 0.1251, 0.1250, 0.1250, 0.1250, 0.1249], [ 0.1250, 0.1248, 0.1251, 0.1250, 0.1250, 0.1250, 0.1251, 0.1251], [ 0.1250, 0.1250, 0.1250, 0.1250, 0.1250, 0.1250, 0.1251, 0.1249], [ 0.1249, 0.1249, 0.1249, 0.1250, 0.1249, 0.1250, 0.1253, 0.1251], [ 0.1249, 0.1251, 0.1251, 0.1250, 0.1249, 0.1249, 0.1249, 0.1251], [ 0.1250, 0.1250, 0.1252, 0.1251, 0.1249, 0.1249, 0.1250, 0.1249], [ 0.1249, 0.1253, 0.1250, 0.1248, 0.1248, 0.1251, 0.1251, 0.1250], [ 0.1249, 0.1251, 0.1251, 0.1252, 0.1250, 0.1248, 0.1250, 0.1249], [ 0.1251, 0.1250, 0.1250, 0.1250, 0.1249, 0.1251, 0.1249, 0.1250], [ 0.1249, 0.1248, 0.1252, 0.1247, 0.1251, 0.1249, 0.1252, 0.1251], [ 0.1249, 0.1249, 0.1251, 0.1250, 0.1248, 0.1250, 0.1249, 0.1254], [ 0.1251, 0.1250, 0.1250, 0.1250, 0.1252, 0.1250, 0.1249, 0.1250], [ 0.1251, 0.1249, 0.1250, 0.1250, 0.1251, 0.1249, 0.1250, 0.1251], [ 0.1251, 0.1252, 0.1251, 0.1247, 0.1252, 0.1249, 0.1249, 0.1250]], device='cuda:0') tensor([[ 0.1251, 0.1251, 0.1251, 0.1250, 0.1247, 0.1250, 0.1251, 0.1250], [ 0.1250, 0.1249, 0.1251, 0.1250, 0.1250, 0.1248, 0.1251, 0.1248], [ 0.1252, 0.1251, 0.1250, 0.1250, 0.1249, 0.1249, 0.1250, 0.1250], [ 0.1248, 0.1249, 0.1250, 0.1249, 0.1252, 0.1250, 0.1251, 0.1251], [ 0.1252, 0.1249, 0.1250, 0.1250, 0.1250, 0.1249, 0.1250, 0.1251], [ 0.1249, 0.1251, 0.1250, 0.1250, 0.1250, 0.1251, 0.1250, 0.1249], [ 0.1249, 0.1249, 0.1251, 0.1251, 0.1246, 0.1251, 0.1251, 0.1251], [ 0.1250, 0.1247, 0.1250, 0.1251, 0.1252, 0.1250, 0.1250, 0.1250], [ 0.1252, 0.1249, 0.1252, 0.1247, 0.1249, 0.1251, 0.1250, 0.1250], [ 0.1248, 0.1251, 0.1251, 0.1249, 0.1249, 0.1249, 0.1251, 0.1252], [ 0.1249, 0.1250, 0.1250, 0.1251, 0.1251, 0.1251, 0.1249, 0.1249], [ 0.1250, 0.1249, 0.1249, 0.1252, 0.1250, 0.1250, 0.1251, 0.1250], [ 0.1249, 0.1251, 0.1249, 0.1251, 0.1252, 0.1250, 0.1248, 0.1250], [ 0.1251, 0.1253, 0.1249, 0.1250, 0.1248, 0.1249, 0.1248, 0.1251]], device='cuda:0') THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train_search.py", line 200, in <module> main() File "train_search.py", line 124, in main train_acc, train_obj, arch_grad_norm = train(train_queue, search_queue, model, architect, criterion, optimizer, lr) File "train_search.py", line 152, in train arch_grad_norm = architect.step(input, target, input_search, target_search, lr, optimizer, unrolled=args.unrolled) File "/home/pyromancer/workspace/shared/darts/cnn/architect.py", line 37, in step input_train, target_train, input_valid, target_valid, eta, network_optimizer) File "/home/pyromancer/workspace/shared/darts/cnn/architect.py", line 53, in _backward_step_unrolled model_unrolled = self._compute_unrolled_model(input_train, target_train, eta, network_optimizer) File "/home/pyromancer/workspace/shared/darts/cnn/architect.py", line 23, in _compute_unrolled_model loss = self.model._loss(input, target) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 110, in _loss logits = self(input) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 104, in forward s0, s1 = s1, cell(s0, s1, weights) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 54, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 54, in <genexpr> s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops)) File "/home/pyromancer/workspace/shared/darts/cnn/model_search.py", line 22, in <genexpr> return sum(w * op(x) for w, op in zip(weights, self._ops)) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu:58

Card is GeForce GTX 1080 8119MiB on ubuntu linux box.

How was random baseline done in paper?

Hi --

In the paper, you describe fairly strong baseline performance from random architectures. Are you able to give a little more information about how those random baselines were done? Specifically, is that the average of a number of random runs, or just a single random run?

Thanks
~ Ben

architect.py

Hi, wonderful work!

I have a question regarding architect.py, hessian_vector_product.
From my understanding R is epsilon, grad_p,grad_n are gradients of loss for w+- (positive and negative)
Why in grad_n calculation you do p.data.sub_(2*R, v) And not p.data.sub_(R, v)?

Thanks!!

Expected Performance

What result should we expect on simply running

cd cnn && python train.py --auxiliary --cutout

Is it necessary to count the parameters in AuxiliaryHeadCIFAR?

Hi, I notice that when you calculate the model size, the parameters in AuxiliaryHeadCIFAR are also included. Is it necessary, or just for fair comparison with other papers?

ImageNet Training Time and Performance?

Hi,

In the python code, the default setting of epochs for ImageNet is 1000, while in the paper, it says that the training epoch is 250. Does the 26.9% top-1 error come from 1000 epochs or 250 epochs? If it comes from 250 epochs, how about the performance of training 1000 epochs?

Finite difference approximation implementation

I traced the codes and noticed that there doesn't exist the part, "Fortunately, the
complexity can be substantially reduced using the finite difference approximation.", which is described in the paper. Has that been implemented?

Thanks

I get an error when I want to change the stirde

When I want to change the output size, I try to change the string size, but the
S = sum (self. _ops [offset + j] (h, weights [offset + j]) for j, h in enumerate (states))
shows RuntimeError: The size of tensor a (20) must match the size of tensor B (40) at non-singleton dimension 3 error
Is there any suggestions?

Cutout is not used in the cnn/train.py

Although the paper mentioned that cutout is used in the experiments and the demo script also specified cutout=True. However, in the cnn/train.py, the cutout is not actually used (at least, I did not find it so far). Could the authors please check this? Or if I was wrong, can you tell me where the cutout is used in the script? Thanks very much!

tricky usage of view()

darts/rnn/model_search.py

Line 28 in d8418b5

    
           ch = masked_states.view(-1, self.nhid).mm(self._Ws[i]).view(i+1, -1, 2*self.nhid)

Seems that the code will produce RuntimeError since i+1 might be undivisable by the whole tensor shape. Is there anything missing to me? Or it's just a fancy usage of torch.view() ?

Reproducing results from table 1 in the paper

I'm referring to the lines labeled: "DARTS (first order) + cutout" (2.94) and "DARTS (second order) + cutout" (2.83). What commands correspond to running in these two modes?

quark0 / darts Goto Github PK

darts's People

Contributors

Stargazers

Watchers

Forkers

darts's Issues

Recommend Projects

Recommend Topics

Recommend Org