Code Monkey home page Code Monkey logo

convnet.pytorch's People

Contributors

eladhoffer avatar tbennun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convnet.pytorch's Issues

Stochastic quantization: difference between code and paper

In the paper, the stochastic quantization was done by rounding up with probability p=clip(0.5x, 0, 1), and rounding down with probability 1-p. However, in the code it's done by adding random uniform noise before quantization:
noise = output.new(output.shape).uniform_(-0.5, 0.5)
output.add_(noise)
This noise does not depend on the magnitude of x. I wonder what is the reasoning behind this discrepancy?

RuntimeError: view size is not compatible with input tensor's size and stride

I have the following config file:

{
    "adapt_grad_norm": null,
    "autoaugment": false,
    "batch_size": 256,
    "chunk_batch": 1,
    "config_file": null,
    "cutmix": null,
    "cutout": false,
    "dataset": "imagenet",
    "datasets_dir": "~/data/",
    "device": "cuda",
    "device_ids": [
        0
    ],
    "dist_backend": "nccl",
    "dist_init": "env://",
    "distributed": false,
    "drop_optim_state": false,
    "dtype": "float",
    "duplicates": 1,
    "epochs": 90,
    "eval_batch_size": -1,
    "evaluate": null,
    "grad_clip": -1,
    "input_size": null,
    "label_smoothing": 0,
    "local_rank": -1,
    "loss_scale": 1,
    "lr": 0.1,
    "mixup": null,
    "model": "alexnet",
    "model_config": "",
    "momentum": 0.9,
    "optimizer": "SGD",
    "print_freq": 10,
    "results_dir": "./results",
    "resume": "",
    "save": "alexnet_unquant",
    "save_all": false,
    "seed": 123,
    "start_epoch": -1,
    "sync_bn": false,
    "tensorwatch": false,
    "tensorwatch_port": 0,
    "weight_decay": 0,
    "workers": 8,
    "world_size": -1
}

I get the following error:

Starting Epoch: 1

Traceback (most recent call last):
  File "main.py", line 364, in <module>
    main()
  File "main.py", line 130, in main
    main_worker(args)
  File "main.py", line 306, in main_worker
    train_results = trainer.train(train_data.get_loader(),
  File "/MyPath/convNet.pytorch/trainer.py", line 269, in train
    return self.forward(data_loader, training=True, average_output=average_output, chunk_batch=chunk_batch)
  File "/MyPath/convNet.pytorch/trainer.py", line 224, in forward
    prec1, prec5 = accuracy(output, target, topk=(1, 5))
  File "/MyPath/convNet.pytorch/utils/meters.py", line 70, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I get the same error when I try the following command in README.rd:

python main.py --model resnet --model-config "{'depth': 18, 'quantize':True}" --save resnet18_8bit -b 64

How to rectify this?

Thanks.

ResNext CIFAR10 crashing during layer construction

When running the command :
python3.6 main.py -b 32 --gpus 1 --model resnext --dataset cifar10

An error pops up

Traceback (most recent call last):
  File "main.py", line 306, in <module>
    main()
  File "main.py", line 194, in main
    train_loader, model, criterion, epoch, optimizer)
  File "main.py", line 295, in train
    training=True, optimizer=optimizer)
  File "main.py", line 254, in forward
    output=model(input_var)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 158, in forward
    x = self.layer1(x)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 72, in forward
    input = module(input)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 56, in forward
    residual = self.downsample(x)
  File "/usr/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/vista_fpga/pytorch_repos/elad_hofer_pytorch/convNet.pytorch/models/resnext_original.py", line 119, in forward
    return torch.cat([ds, self.zero.expand(*zeros_size)], 1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

quantization

If I use torch=0.2.0, I met the error:
Traceback (most recent call last):
File "example/mpii.py", line 352, in
main(parser.parse_args())
File "example/mpii.py", line 107, in main
train_loss, train_acc = train(train_loader, model, criterion, optimizer, args.debug, args.flip)
File "example/mpii.py", line 153, in train
output = model(input_var)
File "/home/wangmeng/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/wangmeng/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 71, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/wangmeng/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/wangmeng/pytorch-pose-quantized/pose/models/hourglass_quantized.py", line 172, in forward
x = self.conv1(x)
File "/home/wangmeng/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/wangmeng/pytorch-pose-quantized/pose/models/modules/quantize.py", line 188, in forward
qinput = self.quantize_input(input)
File "/home/wangmeng/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/wangmeng/pytorch-pose-quantized/pose/models/modules/quantize.py", line 165, in forward
min_value * (1 - self.momentum))
TypeError: add_ received an invalid combination of arguments - got (Variable), but expected one of:

  • (float value)
    didn't match because some of the arguments have invalid types: (Variable)
  • (torch.cuda.FloatTensor other)
    didn't match because some of the arguments have invalid types: (Variable)
  • (torch.cuda.sparse.FloatTensor other)
    didn't match because some of the arguments have invalid types: (Variable)
  • (float value, torch.cuda.FloatTensor other)
  • (float value, torch.cuda.sparse.FloatTensor other)

Whether I need modified the models/modules/quantilized.py or not?

Nan loss for quantization

I'm running the code for 8-bit quantization but found that the training loss always gets NAN while I didn't make a slight modification to the original code. Wondering why this could happen and hoping for your clarification.

Hi, is any explanation about the scale_fix in the RangeBN

A fix is added in the RangeBN for the scale:
scale_fix = (0.5 * 0.35) * (1 + (math.pi * math.log(4)) ** 0.5) / ((2 * math.log(y.size(-1))) ** 0.5)

(2lng(n)) ** 0.5 is explained in the paper. However where do the 0.5 / 0.35/ pie*ln(4) come from?

TypeError: __init__() got an unexpected keyword argument 'reduction'

Hi,
When I try to run the script, I face this error , whats wrong ?
TypeError: __init__() got an unexpected keyword argument 'reduction'
I'm calling the script like this :

IMAGENET_DIR=ImageNet_DataSet
MODEL_NAME=mobilenet
python main.py --dataset $IMAGENET_DIR --model $MODEL_NAME 

I'm using Python3.6 and Pytorch 0.4
Thanks alot in advance

why not use apex?

why can we use apex for FP16 training? Is there any bug for L1BN2D?

May I know the software versions

Hi, @eladhoffer

Thanks for the excellent project. I'm trying the code on my local machine. However, some python package dependence seems not met. I wonder if you could share the version of some key software, such as the pytorch, python, os?

How to run the code in Colaboratory?

I'm trying to run main.py using colab and I executed following command in colab

!python /content/convNet.pytorch/main.py --dataset cifar10 --model resnet --model-config "{'depth': 44}" --duplicates 40 --cutout -b 64 --epochs 100 --save resnet44_cutout_m-40

and I'm getting the following error

Traceback (most recent call last): File "/content/convNet.pytorch/main.py", line 14, in <module> from data import DataRegime, SampledDataRegime File "/content/convNet.pytorch/data.py", line 9, in <module> from utils.dataset import IndexedFileDataset File "/content/convNet.pytorch/utils/dataset.py", line 6, in <module> from torch.utils.data.sampler import Sampler, RandomSampler, BatchSampler, _int_classes ImportError: cannot import name '_int_classes' from 'torch.utils.data.sampler' (/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py)

How can I fix this issue?
What steps do I need to follow to be able to run the code in colab?
What changes do I need to make?

Quantizing Mobilenet

I am trying to quantize mobilenet model in the same how you have implemented resnet (https://github.com/eladhoffer/convNet.pytorch). To accomplish this I added the following lines in models/mobilenet .py

from .modules.quantize import QConv2d, QLinear, RangeBN
torch.nn.Linear = QLinear
torch.nn.Conv2d = QConv2d
torch.nn.BatchNorm2d = RangeBN

But, on training the loss is going to nan. It will be of great help if you could provide some inputs on this.

Thanks in advance

regards
Shreyas

trainer.validate will run full validation set on every GPU.

example for efficient multi-gpu training of resnet50 (4 gpus, label-smoothing, fast regime by fast-ai):

python -m torch.distributed.launch --nproc_per_node=4  main.py --model resnet --model-config "{'depth': 50, 'regime': 'fast'}" --eval-batch-size 512 --save resnet50_fast --label-smoothing 0.1

I made some changes:

python -m torch.distributed.launch --nproc_per_node=8 main.py --model resnet --model-config "{'depth': 34, 'regime': 'fast'}" --batch-size 256 --eval-batch-size 512 --label-smoothing 0.1

The log shows:

TRAINING - Epoch: [15][10/625]	Time 0.810 (1.640)
EVALUATING - Epoch: [15][10/98]	Time 1.353 (3.035)

According to the following formulas:

1281167 / 256 = 5004.5, 5004.5 / 8 = 625.5
50000 / 512 = 97.6, 97.6 / 8 = 12.2

So validation steps should be 12 or 13, not 98.

bug in vgg?

Hi,
cool framework!
note that you add a layer of AvgPool2D with kernel=1 in the class VGG.
This basically doesn't have any effect. Perhaps you meant AdaptiveAveragePool?
In addition, the input for the classification layer is usually 77512, given an input of 224x224.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.