Code Monkey home page Code Monkey logo

cnn-lstm's Introduction

CNN LSTM

Implementation of CNN LSTM with Resnet backend for Video Classification alt text

Getting Started

Prerequisites

  • PyTorch (ver. 0.4+ required)
  • FFmpeg, FFprobe
  • Python 3

Try on your own dataset

mkdir data
mkdir data/video_data

Put your video dataset inside data/video_data It should be in this form --

+ data 
    + video_data    
            - bowling
            - walking
            + running 
                    - running0.avi
                    - running.avi
                    - runnning1.avi

Generate Images from the Video dataset

./utils/generate_data.sh

Train

Once you have created the dataset, start training ->

python main.py --use_cuda --gpu 0 --batch_size 8 --n_epochs 100 --num_workers 0  --annotation_path ./data/annotation/ucf101_01.json --video_path ./data/image_data/  --dataset ucf101 --sample_size 150 --lr_rate 1e-4 --n_classes <num_classes>

Note

  • All the weights will be saved to the snapshots folder
  • To resume Training from any checkpoint, Use
--resume_path <path-to-model> 

Tensorboard Visualisation(Training for 4 labels from UCF-101 Dataset)

alt text

Inference

python inference.py  --annotation_path ./data/annotation/ucf101_01.json  --dataset ucf101 --model cnnlstm --n_classes <num_classes> --resume_path <path-to-model.pth> 

References

License

This project is licensed under the MIT License

cnn-lstm's People

Contributors

pranoyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cnn-lstm's Issues

results have some problems

hello, I do it according to readme,the code is running well,but the results is always 50%. (in fact ,the video in the data are same, I replace the one video. now the video are different,but the results is still 50%),please help me,thank you

cnnlstm: loss does not drop?

Hi,
The following code is fine-tuned according to the code you wrote. When I only use resnet for training, the loss can drop normally, but when I use resnet+lstm, the loss has been around 0.6 and does not drop. Could you please guide me?

class CNNLSTM(nn.Module):
 def __init__(self, num_classes=2):
     super(CNNLSTM, self).__init__()
     self.resnet = resnet34(pretrained=True)
     self.resnet.fc = nn.Sequential(nn.Linear(self.resnet.fc.in_features, 300))  
     self.lstm = nn.LSTM(input_size=300, hidden_size=256, num_layers=3)
     self.fc1 = nn.Linear(256, 128) # Fully connected layer
     self.fc2 = nn.Linear(128, num_classes) # Fully connected layer
 
 def forward(self, x_3d):
     hidden = None
     x_3d = x_3d.unsqueeze(0) # add
     x_ = list()
     for t in range(x_3d.size(1)):
         # with torch.no_grad():
         x = self.resnet(x_3d[:, t, :, :, :]) # x_3d[:, t, :, :, :].shape: [1,3,224,224] 
         out, hidden = self.lstm(x.unsqueeze(0), hidden)    
         
         x = self.fc1(out[:, -1, :])
         x = F.relu(x)
         x = self.fc2(x)
         if t==0:
             x_ = x
         else:
             x_ = torch.cat([x_, x], dim=0)
     return x_ 

Accuracy drop

Thanks for your great work!
I followed the step and running the code successfully.
Unfortunately, the learning curve image shown below show that the accuracy drop to 60% validation accuracy,
where there is a 30% gap with your learning curve.
I don't understand why there are only four label in your tensorboard visualization.
All in all, I'm wondering what's the performance you get while training for 101 labels of UCF101,
and make sure I got anything wrong. Thanks
cnn_lstm_epoch100

About ”with torch.no_grad()“

Hi,
I saw the line with torch.no_grad() in the cnnlstm.py file, this means that the weight of resnet will not be affected by backward. If I delete with torch.no_grad(), the backward will update the weights of cnn and lstm at the same time?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.