Code Monkey home page Code Monkey logo

pytorch-video-recognition's Introduction

pytorch-video-recognition

Introduction

This repo contains several models for video action recognition, including C3D, R2Plus1D, R3D, inplemented using PyTorch (0.4.0). Currently, we train these models on UCF101 and HMDB51 datasets. More models and datasets will be available soon!

Note: An interesting online web game based on C3D model is in here.

Installation

The code was tested with Anaconda and Python 3.5. After installing the Anaconda environment:

  1. Clone the repo:

    git clone https://github.com/jfzhang95/pytorch-video-recognition.git
    cd pytorch-video-recognition
  2. Install dependencies:

    For PyTorch dependency, see pytorch.org for more details.

    For custom dependencies:

    conda install opencv
    pip install tqdm scikit-learn tensorboardX
  3. Download pretrained model from BaiduYun or GoogleDrive. Currently only support pretrained model for C3D.

  4. Configure your dataset and pretrained model path in mypath.py.

  5. You can choose different models and datasets in train.py.

    To train the model, please do:

    python train.py

Datasets:

I used two different datasets: UCF101 and HMDB.

Dataset directory tree is shown below

  • UCF101 Make sure to put the files as the following structure:
    UCF-101
    ├── ApplyEyeMakeup
    │   ├── v_ApplyEyeMakeup_g01_c01.avi
    │   └── ...
    ├── ApplyLipstick
    │   ├── v_ApplyLipstick_g01_c01.avi
    │   └── ...
    └── Archery
    │   ├── v_Archery_g01_c01.avi
    │   └── ...
    

After pre-processing, the output dir's structure is as follows:

ucf101
├── ApplyEyeMakeup
│   ├── v_ApplyEyeMakeup_g01_c01
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...
├── ApplyLipstick
│   ├── v_ApplyLipstick_g01_c01
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...
└── Archery
│   ├── v_Archery_g01_c01
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...

Note: HMDB dataset's directory tree is similar to UCF101 dataset's.

Experiments

These models were trained in machine with NVIDIA TITAN X 12gb GPU. Note that I splited train/val/test data for each dataset using sklearn. If you want to train models using official train/val/test data, you can look in dataset.py, and modify it to your needs.

Currently, I only train C3D model in UCF and HMDB datasets. The train/val/test accuracy and loss curves for each experiment are shown below:

  • UCF101

  • HMDB51

Experiments for other models will be updated soon ...

pytorch-video-recognition's People

Contributors

sdxass avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-video-recognition's Issues

load checkpoint

optimizer.step()
File "/home/z/anaconda3/envs/py3/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'other'

Inference.py question for frame process

Hello @jfzhang95 , thanks for your share of c3d implement.

When I inference with trained c3d model, I notice that you did some process to central croped frame.

listed here:
image

tmp = tmp_ - np.array([[[90.0, 98.0, 102.0]]])

Kindly help to explain that what purpose of doing this operation.

Thanks again.

KeyError: 'state_dict'

I run inference.py with pretrained model:
Traceback (most recent call last):
File "inference.py", line 78, in
main()
File "inference.py", line 32, in main
model.load_state_dict(checkpoint['state_dict'])
KeyError: 'state_dict'

Decreasing Test Accuracy in README .png

In your README, the tensorboardx output for Test Acc is steadily decreasing over all 100 epochs. Is that just a 'typo' or were those your actual results?

image

PS Thanks a ton for the code. It has been very helpful!

C3D training from scratch

hi @jfzhang95 , thanks for your code. I'm trying to train C3D model from scratch using your code. I haven't change any setting. After several epoches, the training loss remains to be NaN. What should I do to train C3D model from scratch ? I'm using UCF101 dataset.

RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

When I run the train.py occur this problem, anyone can solve this?

Traceback (most recent call last):
File "/home/common1/huangjing/MyCode/PythonCode/pytorch-video-recognition/train.py", line 201, in
train_model()
File "/home/common1/huangjing/MyCode/PythonCode/pytorch-video-recognition/train.py", line 138, in train_model
loss.backward()
File "/home/huangjing/miniconda3/envs/PyCharm/lib/python3.7/site-packages/torch/tensor.py", line 150, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/huangjing/miniconda3/envs/PyCharm/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

The order of frame sequences in Videodataset is wrong.

frames = sorted([os.path.join(file_dir, img) for img in os.listdir(file_dir)])
this operation will get something like
0001
00010
00011
00012
...
00019
0002
00020
00021
...
the correct code should be
frames = sorted([os.path.join(file_dir, img) for img in os.listdir(file_dir)],key=lambda x:int(x.split('/')[-1][:-4]))
0001
0002
0003
0004
...

No such file or directory: '/path/to/Models/c3d-pretrained.pth'

when run train.py, I counter the problem followed:
Traceback (most recent call last):
File "/tmp/pycharm_project_466/train.py", line 202, in
train_model()
File "/tmp/pycharm_project_466/train.py", line 61, in train_model
model = C3D_model.C3D(num_classes=num_classes, pretrained=True)
File "/tmp/pycharm_project_466/network/C3D_model.py", line 42, in init
self.__load_pretrained_weights()
File "/tmp/pycharm_project_466/network/C3D_model.py", line 109, in __load_pretrained_weights
p_dict = torch.load(Path.model_dir())
File "/home/zhanghao/anaconda3/envs/python35/lib/python3.5/site-packages/torch/serialization.py", line 356, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/Models/c3d-pretrained.pth'
How can I solve it?

KeyError: 'state_dict' when running inference.py file

---> 32 model.load_state_dict(checkpoint['state_dict'])
33 # model.load_state_dict(torch.load("C:\Users####\pytorch-video-recognition\c3d-pretrained.pth.tar", map_location=lambda storage, loc: storage))
34

KeyError: 'state_dict'

The unbelievable results,train ACC is about 100%!

Your work is rather prefect! I got the train Acc 0.9987220447284345,val acc 0.9923857868020304 and test acc 0.9851063829787234,It seems too good to be true,can you check the result whether these results are true?

your pretrained model work very badly in inference.py in any av

your pretrained model work very badly in inference.py in any av. why ?
I init models as follows:
model = C3D_model.C3D(num_classes=101, pretrained=True)
# checkpoint = torch.load('./models/C3D_ucf101_epoch-39.pth', map_location=lambda storage, loc: storage)
# model.load_state_dict(checkpoint['state_dict'])
model.to(device)
model.eval()

Evaluation on official split 01 of UCF101

Hello,

as mentioned in the description, sklearn is used to split train/val/test data for each dataset. Has anybody tried to train and evaluate C3D model on the split 01 of UCF101? I gave it a try, where validation set is the same to test set and got the following results:

results

Is the official split such different than the sklearn split to justify the much lower performance accuracy?

The accuracy of C3D training from scartch is low

The accuracy of C3D training from scratch is 30% with lr=1e-5 and the accuracy is below 1% with lr=1e-3 , which are lower than the paper claimed. I also notice someone added BN to C3D and the accuracy is about 45%. Anybody knows why?

train_test_split on ucf 101 dataset

Hi. Thank you for your uploading code.

I have a question in the dataset.py code.

As I know, ucf101 dataset have train/test list file, but in that code it divided by train_test_split by random.

So, it may cause the overlap problem in train / test dataset.

How do you think about this problem?

VideoDataset has a bug about data normalization

There is a bug about normalize(self, buffer) function in dataset.py, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch.
And I also tested it, if we don't normalize it, the training process was totally failed when I used the official train/test split of UCF101, after 54 epochs, the testing accuracy was only around 5%.
And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy.

def normalize(self, buffer):

train from resume epochs

optimizer.step()
File "/home/z/anaconda3/envs/py3/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'other'

something wrong?

out of memory?

Hi! I try the code with TITANXp 12Gb, but before it starts training, it reported that "Runtimerror: cuda out of memory".
Do your guys meet the same issues?
Could anyone give some ideas for me.....Thanks

Memmory Error for Batch Size >50

The only extra piece of code I added is Data Parallel, to use both of my GPUs. So I am training for a larger batch size of 50 on my 2 GPUs (GeForce GTX TITAN X -12GB RAM) . It runs the first 10-15 epochs but then it spits out a MemoryError . When I tried to train it with a batch size of 200 it gave me an error from the beginning. I know that if i reduce the batch size it will solve the error, however can someone provide an alternative solution?

Downsample step

I was checking your Pytorch implementation of the R2Plus1D model against the implementation in Caffe2 in the repository of the original paper (https://github.com/facebookresearch/VMZ), and I was wondering why you chose to implement the downsample step as a SpatioTemporalConv layer, while in the original implementation they seem to use only one Conv3D layer. They have coded it as follows:

if (num_filters != input_filters) or down_sampling:
shortcut_blob = self.model.ConvNd(
shortcut_blob,
'shortcut_projection_%d' % self.comp_count,
input_filters,
num_filters,
[1, 1, 1],
weight_init=("MSRAFill", {}),
strides=use_striding,
no_bias=self.no_bias,
)
if spatial_batch_norm:
shortcut_blob = self.model.SpatialBN(
shortcut_blob,
'shortcut_projection_%d_spatbn' % self.comp_count,
num_filters,
epsilon=1e-3,
is_test=self.is_test,
)

Was this design choice on purpose, and if so, could you perhaps tell me why?

Thanks!

the problem in the load_frames

def load_frames(self, file_dir):
frames = sorted([os.path.join(file_dir, img) for img in os.listdir(file_dir)])
frame_count = len(frames)
buffer = np.empty((frame_count, self.resize_height, self.resize_width, 3), np.dtype('float32'))
for i, frame_name in enumerate(frames):
frame = np.array(cv2.imread(frame_name)).astype(np.float64)
frame -= np.array([[[90.0, 98.0, 102.0]]])
# frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
buffer[i] = frame

    # convert from [T, H, W, C] format to [C, T, H, W] (what PyTorch uses)
    # T = Time, H = Height, W = Width, C = Channels
    buffer = buffer.transpose((3, 0, 1, 2))

    return buffer

in the load_frames why you minus [90,98,102]

Pretraining dataset

Hi!
Thanks for the repo, I've recently implemented your model, and I was wondering if you could tell what dataset was used to obtain the pretrained weights?
Thanks

Train from scratch

Anyone try to train from scratch on Ucf101 on C3D? The accuracy keep 1%. I use other models implemented by myself and the accuracy is also 1%. The learning rate is 1e-5. Does anyone have some idea on it?

why the training loss always none?

I got some loss like this:


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:10<00:00,  2.24it/s]
[train] Epoch: 22/100 Loss: nan Acc: 0.010870849580527
Execution time: 250.25667172999238

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.16it/s]
[val] Epoch: 22/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.448329468010343

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:09<00:00,  2.23it/s]
[train] Epoch: 23/100 Loss: nan Acc: 0.010870849580527
Execution time: 249.90277546200377

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.09it/s]
[val] Epoch: 23/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.87914375399123

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 424/424 [04:09<00:00,  2.24it/s]
[train] Epoch: 24/100 Loss: nan Acc: 0.010870849580527
Execution time: 249.9237438449927

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:26<00:00,  5.16it/s]
[val] Epoch: 24/100 Loss: nan Acc: 0.011121408711770158
Execution time: 26.460865497996565

It;s all nan, for what reason maybe?

about mypath.py

I don't know the directory tree about the root_dir and output_dir,when i run train.py,just show not found files.please display the directory tree.thanks.

Dataset not loading

I encountered some problem running the train.py file and i think it's related to the dataset. After downloaded the UCF101 dataset, I've put it in the UCF-101 folder and run the train.py file but getting some errors. Anyone with a solution? Thanks

C3D training from scratch met RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Hello @jfzhang95 , thanks for your code firstly.

I'm trying to train C3D from scratch on my own ucf101 style dataset.

I changed ucf101 config from 101 to 2 & num_workers=1 in train.py and dataset path in mypath.py, except mentioned above I didn't change any other settings.

When I run 'python train.py', I got this runtime error and don't know what happened.

Traceback (most recent call last):
File "C:/Users/google/Desktop/pytorch-video-recognition-master/train.py", line 203, in
train_model()
File "C:/Users/google/Desktop/pytorch-video-recognition-master/train.py", line 131, in train_model
outputs = model(inputs)
File "D:\Anaconda3\envs\video\lib\site-packages\torch\nn\modules\module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "C:\Users\google\Desktop\pytorch-video-recognition-master\network\C3D_model.py", line 46, in forward
x = self.relu(self.conv1(x))
File "D:\Anaconda3\envs\video\lib\site-packages\torch\nn\modules\module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\envs\video\lib\site-packages\torch\nn\modules\conv.py", line 421, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED


The env is win10 cuda9 torch0.4.0, I'm not sure if I should run this under linux.

Thanks if anyone can help.

About the Features

Sorry to bother! I used your pretrained model to extract video features on HMDB51 dataset, However, I find that every video has similar features, each dimension about the value 0.7.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.