tcapelle / action_recognition Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 5.0 18.01 MB

Solving UCF-101 with fastai2

Home Page: https://tcapelle.github.io/action_recognition/

License: Apache License 2.0

Jupyter Notebook 99.51% Python 0.47% Makefile 0.01% Shell 0.01%

action_recognition's Introduction

Hello, I am Thomas Capelle

I am currently working at Weights and Biases making machine learning better for everyone!

You can check my blog where I talk mostly about my Deep Learning journey
You can also follow me on twitter
Or discuss with me on the fastai discord channel

🗺 Location

I live in the Alps region in France, in a city called Chambery. I like the mountains and enjoy the lake in summer.

🔥 Latest

Article about Deep Learning performance of the new apple M1 Pro processor

action_recognition's People

Contributors

Stargazers

Watchers

Forkers

nikky4d m-a-i vinaychaudhari1996 willforcv fshimaa piyushmishra12

action_recognition's Issues

FileNotFoundError: [Errno 2] File /content/drive/My Drive/data/ucfTrainTestlist/testlist01.txt does not exist:

Even though I am providing the correct paths, this error is generated in 04_train_baseline.ipynb

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-111-e45d3985d19c> in <module>()
      1 #slow
----> 2 val_idxs = get_split_idxs()

6 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File /content/drive/My Drive/data/ucfTrainTestlist/testlist01.txt does not exist: '/content/drive/My Drive/data/ucfTrainTestlist/testlist01.txt'

pretrain model

hi, is there any pretrained weights such as imagenet?

How to make inference on new video or image using trained model ?

Hi @tcapelle ,

I really appreciate your efforts in making this open-source project.

I have trained my model using your given steps and I got 96% accuracy. Now I want to use a trained model for inference

1) New Video
2) Live Camera Feed
3) Image (jpg/png)

How can I achieve that, can you share the code for that?

Thanks :)

Possible to train in MultiGPU (using DataParallel)?

Was attempting to train this in multiple GPUs by changing:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleModel(num_classes=dls.c, seq_len=seq_len)
model= nn.DataParallel(model)
model.to(device)

Which returns a
'DataParallel' object has no attribute 'encoder'

when setting up the Learner.

Does anyone here have a sample for training on multiple GPUs?

Error when training

Sorry to bother you. I got an error as follows, how can I fix this ?

'''
File "/home/michael/action_recognition/action_recognition/models.py", line 256, in forward
video = torch.stack(video, dim=1) #to deal with the ImageTuple

TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not TensorCategory
'''

How to check test accuracy?

You mentioned 91% accuracy on the test set.

The accuracies that are output during training are training accuracies right, and the best accuracy in your 04_train_baseline.ipynb is 49%. So where is the test set accuracy of 91%?

How to get the test accuracy after learn.show_results() ?

the model predict a result tensor like [-0.4135, -0.3328, -0.4700, -0.4385, 0.1240]

I find that the model does not end with the softmax layer.

The model predicts something like [-0.4135, -0.3328, -0.4700, -0.4385, 0.1240],

If I process tensor, something like [-0.4135, -0.3328, -0.4700, -0.4385, 0.1240], with the softmax layer, could I get the real probability of each category?

This appears to be an CNN-LSTM and not a ConvLSTM

https://www.quora.com/What-is-the-difference-between-ConvLSTM-and-CNN-LSTM#:~:text=ConvLSTM%20is%20when%20you%20have,as%20a%20spatial%20feature%20extractor.

Instructions to run baseline model

Can we have one set of instructions for extraction of frames and running the model and getting the outputs ?

The notebooks are a bit confusing.

I have a dataset of 3 classes consisting of 25 videos each in the format of UCF101 and a traintest split file as well.

So I require one set of instructions to run.(similar to the original repo)

Trying to add aug_transforms to get_action_dataloaders

This is really amazing work! I have learned so much about fastai2 from this code repo. However, I am struggling to make a small addition. I'm trying to add the aug_transforms list into get_action_dataloaders. In another project I did something like this.

tfms = aug_transforms(max_zoom=1.2, max_lighting=0.3) mean = [0.16724987, 0.1670983 , 0.18462411] std = [0.19008317, 0.1897096 , 0.20567844] dblock = DataBlock(blocks=(ImageBlock, CategoryBlock), getters=getters, splitter=IndexSplitter(idxs), item_tfms=[Resize(image_size)],batch_tfms=[*tfms, Normalize.from_stats(mean, std)])

Adding the batch_tfms I have above into the method below doesn't seem to generate the transforms in the images. Where would I add aug_transforms here?

def get_action_dataloaders(files, bs=8, image_size=64, seq_len=20, val_idxs=None, random_sample=False): "Create a dataloader with val_idxs splits" splits = RandomSplitter()(files) if val_idxs is None else IndexSplitter(val_idxs)(files) itfm = ImageTupleTfm(random_sample=random_sample, seq_len=seq_len) ds = Datasets(files, tfms=[[itfm], [parent_label, Categorize]], splits=splits) dls = ds.dataloaders(bs=bs, after_item=[Resize(image_size), ToTensor], batch_tfms=[*tfms, Normalize.from_stats(mean, std)], after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)], drop_last=True) return dls

Can't get transforms to work in this scenario

I'm passing a dataframe to the get_action_dataloaders, however I can't get the transforms in the DataBlock to work for me. Any suggestions on what I might be doing wrong?

def get_block(image_size=64, seq_len=4, val_idxs=None):
    "A block for sequence of images from file path list"
    tfms = [*aug_transforms(size=500, flip_vert=True, max_rotate=45, max_warp=0)]
    block = DataBlock(blocks    = (ImageTupleBlock, CategoryBlock),
                      get_y     = get_y,
                      item_tfms=[RandomResizedCrop(350)],
                      batch_tfms=tfms + [Normalize()]),
                      get_x = get_x,
                      splitter  = ColSplitter())
    return block

def get_action_dataloaders(df, bs=8, image_size=64, seq_len=20, val_idxs=None, random_sample=True, **kwargs):
    "Create a dataloader with `val_idxs` splits"
    dblock = get_block()
    dls = dblock.dataloaders(df, bs=bs, drop_last=True, **kwargs)
    return dls

AssertionError

I have 60 videos for training and 15 for testing.
I faced this error in 04_train_convlstm.ipynb

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-15-0a0f4116d3e6> in <module>()
      1 seq_len = 5
      2 image_size = 128
----> 3 assert len(files)/8 == len(files)//8

AssertionError:

Videos with different number of frames

Hi,

Thanks for sharing your solution. It is of great help for beginners to get familiar with using fastai2 for action detection.

I had a question about your custom data loader. I have a set of videos with different number of frames. I want to ensure that during training, the model should see the entire set of frames of each video. I see in your code you have selected a random frame location in the video and taken a fixed number of frames from that point on (as far as I understood).

Do you have this functionality that the dataloader can show the model the entire set of frames, for each video.
Does the model (baseline or ConvLSTM) require that the number of sequences be equal for all the videos in the input data?
Especially, if I get another video tomorrow on which I need to make a prediction that might be of longer duration (than all the videos in the training set), would this model work?

Thanks,

Sam

Use dataframe instead of folder structure

The current implementation depends on the video frames being in folders. I have a csv file with rows containing the folders and the corresponding label. I seem to be going around in circles trying to figure out where the best place to make the code change would be. I'm sure this should be a fairly simple change, however I'm just not seeing where the best place to make the change would be. I think part of my problem is that the current implementation uses the folder path to obtain the label, whereas my label is associated with the folder name in the dataframe.

HOw to achieve 94.8% of Accuracy

Dear Tcapelle.

I clone this repo and find that I could only achieve 60% of Accuracy if I achieve ConvLSTM model. It is obvious that I overfit this model,

Could you share the training procedure.

Have a good day.

[Question] : Is it possible to use same code for UCF - CRCV dataset ?

Hi ,
I'm trying to build video classification model using small subset of UCF-CRCV (https://www.kaggle.com/mission-ai/crimeucfdataset) dataset.

Is it possible to use same logic on top of UCF-CRCV ?

Thanks,

explaining ImageTupletfm

Hi,

I don't fully understand how to create the dataset. Can you clear up some confusing points please? In particular,

what does ImageTuple return? In imagetuple, does it return a figure from show() method or actual images?
in dataset, what is the function parent_label? is it a fastai function?
in dataset creation, the input "files" is a list of paths. each path is a folder containing images for that set. then to create a single (x,y) input to the model, the ImageTupletfm applies on the images from this path sampling them up to seq_len. the sampled set and the parent_label make the (x,y) result? Is this correct?

RuntimeError: stack expects a non-empty TensorList

Hi @tcapelle
Thank you for your great job.
Could I ask you a question, please?

I try training with 2 class and have an error like that. How can we fix this error?

** in forward(self, x)
35 def forward(self, x):
36 if self.debug: print(f' input len: {len(x), x[0].shape}')
---> 37 x = torch.stack(x, dim=1)
38 if self.debug: print(f' after stack: {x.shape}')
39 batch_size, seq_length, c, h, w = x.shape

RuntimeError: stack expects a non-empty TensorList**

Thank you very much!

get_action_dataloaders() got an unexpected keyword argument 'device'

I got this error while running 00_core.ipynb in nbs :

TypeError                                 Traceback (most recent call last)
<ipython-input-59-6d022d90a5b8> in <module>()
----> 1 dls = get_action_dataloaders(files, bs=64, val_idxs=split0_idxs, device='cpu')

TypeError: get_action_dataloaders() got an unexpected keyword argument 'device'

Even the get_action_dataloaders() has no argument called device 👍

def get_action_dataloaders(files, bs=8, image_size=64, seq_len=20, val_idxs=None, random_sample=False):
    "Create a dataloader with `val_idxs` splits"
    splits = RandomSplitter()(files) if val_idxs is None else IndexSplitter(val_idxs)(files)
    itfm = ImageTupleTfm(random_sample=random_sample, seq_len=seq_len)
    ds = Datasets(files, tfms=[[itfm], [parent_label, Categorize]], splits=splits)
    dls = ds.dataloaders(bs=bs, after_item=[Resize(image_size), ToTensor], 
                         after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)], drop_last=True)
    return dls

Why is this argument mentioned ?

TypeError: 'PosixPath' object is not subscriptable

Hi Thomas!

There's currently an issue which I think is due to the new fastcore versions. If you try running 04_train_convlstm upon get_dls it'll throw the following trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-cb139abcf192> in <module>
      1 seq_len = 20
      2 bs = 8
----> 3 dls = get_dls(128, seq_len, bs=bs)

<ipython-input-5-f58cd2a11008> in get_dls(image_size, seq_len, bs)
      2     "get ImageTuple dataloader"
      3     block = get_block(image_size, seq_len)
----> 4     return block.dataloaders(files, bs=bs, drop_last=True)

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastai2/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
    105 
    106     def dataloaders(self, source, path='.', verbose=False, **kwargs):
--> 107         dsets = self.datasets(source)
    108         kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
    109         return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastai2/data/block.py in datasets(self, source, verbose)
     99     def datasets(self, source, verbose=False):
    100         self.source = source                     ; pv(f"Collecting items from {source}", verbose)
--> 101         items = (self.get_items or noop)(source) ; pv(f"Found {len(items)} items", verbose)
    102         splits = (self.splitter or RandomSplitter())(items)
    103         pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)

/media/mldata/fastai2/scott/action_recog/capelle_act_recog/nbs/action_recognition/core.py in make_sequences(tuples_files, seq_len)
     51 def make_sequences(tuples_files, seq_len=40):
     52     "slice sequences to `seq_len`"
---> 53     return L(tups[0:seq_len] for tups in tuples_files)
     54 
     55 # Cell

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     45             return x
     46 
---> 47         res = super().__call__(*((x,) + args), **kwargs)
     48         res._newchk = 0
     49         return res

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastcore/foundation.py in __init__(self, items, use_list, match, *rest)
    316         if items is None: items = []
    317         if (use_list is not None) or not _is_array(items):
--> 318             items = list(items) if use_list else _listify(items)
    319         if match is not None:
    320             if is_coll(match): match = len(match)

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastcore/foundation.py in _listify(o)
    252     if isinstance(o, list): return o
    253     if isinstance(o, str) or _is_array(o): return [o]
--> 254     if is_iter(o): return list(o)
    255     return [o]
    256 

/media/mldata/fastai2/scott/action_recog/capelle_act_recog/nbs/action_recognition/core.py in <genexpr>(.0)
     51 def make_sequences(tuples_files, seq_len=40):
     52     "slice sequences to `seq_len`"
---> 53     return L(tups[0:seq_len] for tups in tuples_files)
     54 
     55 # Cell

TypeError: 'PosixPath' object is not subscriptable

Any chance you could take a look? Thanks!

fastcore and fastai2 are on 0.1.17 and 0.0.17 respectively