craston / mars Goto Github PK
View Code? Open in Web Editor NEWMARS: Motion-Augmented RGB Stream for Action Recognition
License: MIT License
MARS: Motion-Augmented RGB Stream for Action Recognition
License: MIT License
I sincerely appreciate the provided code.I want to know how to get the label file of HMDB51. I only downloaded the video file.
Thank you very much.
Sorry for disturbing
But i got another problem
HMDB51 dataset worked fine, to train UCF101 dataset I just changed train part:
print("Preprocessing train data ...")
train_data = globals()['{}_test'.format(opt.dataset)](split = opt.split, train = 0, opt = opt) --> train 1 to 0
It seems everything is fine, however i got error below:
Is it related with code or I made mistake
Preprocessing train data ...
Length of train data = 3678
Preprocessing validation data ...
Length of validation data = 3678
Preparing datatloaders ...
Length of train datatloader = 114
Length of validation datatloader = 114
Loading model... resnext 101
loading pretrained model trained_models/kinetics/RGB_Kinetics_16f.pth
Layers to finetune : ['layer4', 'fc']
Initializing the optimizer ...
lr = 0.001 momentum = 0.9 dampening = 0.9 weight_decay = 1e-05, nesterov = False
LR patience = 10
run
Traceback (most recent call last):
File "train.py", line 119, in
for i, (inputs, targets) in enumerate(train_dataloader):
File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/fazlik/Desktop/MARS/dataset/dataset.py", line 288, in getitem
clip = get_train_video(self.opt, frame_path, Total_frames)
File "/home/fazlik/Desktop/MARS/dataset/dataset.py", line 96, in get_train_video
start_frame = np.random.randint(0, Total_frames)
File "mtrand.pyx", line 992, in mtrand.RandomState.randint
ValueError: Range cannot be empty (low >= high) unless no samples are taken
Hi,
thank you for your excellent work. In your paper the weight-decay sets to 5e-4 but it is 1e-5 in your code which is quiet different. Can you tell me which setting is right or which setting is better .Thanks.
Figure 2: Training to mimic the Flow stream. We first train the Flow stream to classify actions using optical flow clips with cross entropy loss and freeze its weights. To mimic flow features using RGB frames, in step 1, we backpropagate the MSE loss through all the layers of MERS except the last layer. In step 2, we separately train the last layer of MERS with a cross entropy loss.
Dear @craston , I am a llitle confused in your paper after looking your code .
According to your paper, I think a model except the last layer has been trained with mse loss . And then you use this model to train the last layer with entropy loss. This is two separate process .
However, according to your code, you performance two steps in an epoch.
So the right operation is performancing two steps in an epoch ?
Line 224 in 578e40f
Shouldn't "else " Be match to "if" ?
Hi,
I have reproduced the test results on HMDB51 and UCF101 successfully.
However, I have trouble with running test SomeThingSomeThingV1.
I downloaded frame_folder from SomeThingSomeThingV1 main page(https://20bn.com/datasets/something-something/v1), then I added code for reading SomeThingSomeThingV1 in dataset.py . Finally, I ran test SomeThingSomeThingV1 single stream RGB only 64f:
python3 test_single_stream.py --batch_size 1 --n_classes 174 --model resnext --model_depth 101 --log 1 --dataset SmtSmt --modality RGB --sample_duration 64 --split 1 --only_RGB --resume_path1 "/host/mars/models/SMTSMT/RGB_Something_Something_64f.pth" --frame_dir "/host/SomethingSomethingV1_validation_frames/" --annotation_path "/host/mars/dataset/SmtSmt_labels/" --result_path "/host/mars/results_test_smtsmt/"
It was successfully executed but the result is completely wrong (accuracy is only 0.3%). Can you identify what i did wrong ? Do I need to modify the frame_folder after downloading ? Or can you help me demonstrate steps for running test SomeThingSomeThingV1 single stream RGB only 64f ?
(Here I attached file dataset.py with code for SomeThingSomeThingV1 added, just in case you need to check it).
The 3d CNN works with the videos, MRI, and scan datasets. Can anyone tell me If I have to feed the input (video) to the proposed 3d CNN network, and train it's weights, how can I able to do that? as 3d CNN expect 5 dimensional inputs;
[batch size, channels, depth, height, weight]
how can I extract depth from the videos?
If I have 10 video of 10 different classes. The duration of each video is 6 seconds. I extract 2 frames for each second and it goes around 12 frames for each video.
Size of RGB videos is 112x112 --> Height = 112, Width=112, and Channels=3
If I keep the batch size equals 2
1 video --> 6 seconds --> 12 frames (1sec == 2frames) [each frame (3,112,112)]
10 videos (10 classes) --> 60 seconds --> 120 frames
So the 5 dimensions will be something like this; [2, 3, 12, 112, 112]
2 --> Two videos will be processed for each batch size.
3 --> RGB channel
12 --> each video contains 12 frames
112 --> Height of each video
112 --> Width of each video
First, I need to label all 10 videos [3, 12, 112, 112] --> [channels, frames (depth), height, width], then I am feeding it to the Data Loader (Pytorch) to make it to batch size [2, 3, 12, 112, 112]
I use data loader in Pytorch, I am keeping its batch size equals 2 as I am processing 2 videos each time during the training, this way my 10 videos will be trained for 5 times.
Am I right? or can you suggest any other method to do this?
The code below receives a variable called tensor and outputs the tensor to which nothing is applied.
class Normalize(object):
"""Normalize an tensor image with mean and standard deviation.
Given mean: (R, G, B) and std: (R, G, B),
will normalize each channel of the torch.*Tensor, i.e.
channel = (channel - mean) / std
Args:
mean (sequence): Sequence of means for R, G, B channels respecitvely.
std (sequence): Sequence of standard deviations for R, G, B channels
respecitvely.
"""
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, tensor):
"""
Args:
tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
Returns:
Tensor: Normalized image.
"""
# TODO: make efficient
for t, m, s in zip(tensor, self.mean, self.std):
t.sub_(m).div_(s)
return tensor
def randomize_parameters(self):
pass
Can you tell me why I am getting this?
hello, I'm trying to implement on HMDB51-2,3 datasets, using HMDB51-1 model weight for pre-training.The accuracy obtained was less than 76% on test set. I want to know whether you still have a trained model.
Hello, i got a question about whether the parameter of training 64f the MARS model in HMDB dataset is the same as 16f ?
Is the MiniKinetics trained model available? Could you share it?
When i run the extract_frames_flows.py to extract flow and frames , it show me
sh: /home/ncrasto/code/workspace/action-recog-release/utils1/tvl1_videoframes: No such file or directory
i have run g++ -std=c++11 tvl1_videoframes.cpp -o tvl1_videoframes -I${OPENCV}include/opencv4/ -L${OPENCV}lib64 -lopencv_objdetect -lopencv_features2d -lopencv_imgproc -lopencv_highgui -lopencv_core -lopencv_imgcodecs -lopencv_cudaoptflow -lopencv_cudaarithm
could you help me what should i do,thank u very much
is it [batch_size, clips, x-y channels, weight, height] ---> [64, 16, 20, 224,224]
10 is for x-axis and 10 for y-axis?
I have another question
In your paper RGB results for UCF-101 dataset is 95.2% for 64f-clip
In 'Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet' CVPR 2018 paper the result is 94.5%
However, approaches are the same in both cases(resnext-101).
Could you please explain, why your RGB result is higher than 94.5%.
I could not find any difference between your source code and 'Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet' paper's source code.
That is why i am wondering why two different results are obtained.
Maybe you added something which i have not mentioned in your code.
Thanks
In dataset.py, it seems something-something is missing?
Can someone train UCF-101 dataset
I can not train it.
If you could train please, leave comment how(details)
Thank you
When I tested MRS+Flow+RGB, my accuracy was only 94.8%.May I know how you tested it? I wish you could tell me
Why can the resnext handle 2 channel optical flow graphs
Thank you for your good effort.
I want to ask that the accuracy of validation during training RGB stream is basically unchanged. I tried to fine-tune the RGB stream on UCF101 dataset. The accuracy of validation is stable at around 0.86 since step 40.
Is my training process right? I do not modify the hyperparameters.
Thank you for publishing this good work!
In your paper, you have mentioned the result of two-stream rgb+flow. However, I didn't find the method to train two-stream in this repo.
Can you give me this training script?
Thanks!
How to look at the accuracy? The accuracy displayed is the accuracy of a batch size. How to display the accuracy of a epoch?
Hi,
Firstly, thanks for a great job
However, I have an issue.
To extract optical flow I need to install OpenCV(with GPU support).
I could not install OpenCV as you mentioned.
Could you please help me with this issue
Thank you
My data structure is as below.
Here is my pre-trained model.
/workspace/MARS/MARS/trained_models/RGB_HMDB51_64f.pth
My test script is below.
python3 test_single_stream.py --batch_size 1 --n_classes 51 --model resnext --model_depth 101 --log 0 --dataset HMDB51 --modality RGB --sample_duration 64 --split 1 --only_RGB --resume_path1 "trained_models/RGB_HMDB51_64f.pth" --frame_dir "dataset/HMDB51/" --annotation_path "dataset/HMDB51_labels/testTrainMulti_7030_splits/"
It proceeds like the photo below and it no longer works.
MARS/dataset/preprocess_data.py
Line 308 in ae2749d
Thank you for publishing this amazing work!
In your paper, you have mentioned the result of three-stream rgb+mars+flow. However, I didn't find the method to test three-stream result in this code.
Could you or anyone else who follow this work publish the code for testing three-stream results?
Could you please give me some tips about how to train your models on UCF101 dataset?
Thanks
May i ask one more question? When the training the 64f Flow HMDB51 with the pretrained model 'Kinetics', however the accuracy of the 400 epoch model cannot reach the result as yours. Did you choose the last epoch model or others. Can you give me some advice on this?
Hello,
I want to finetune the MARS and MERS on my own dataset using pre-trained weights of Kinetics-400 provided by you. For that I need to extract optical flow from my dataset, right?
Thank you, for sharing this wonderful repo online! Hope to hear from you soon!
Regards,
Ishan
Hi there ,can you share the class activation map script to visualize how the model is shifting its attention from time to time throughout a video?
Thanks in anticipation.
Line 278 in 578e40f
should be changed as below:
frame_path = video[0]
Hi,
Thank you for the models and code. I am doing research on 3D activity recognition models like MARS/MERS. On your g-drive you have provided the Kinetics MARS_64f.pth; I was wondering if you could also provide Kinetics MERS_64f.pth so that I could compare them please.
Thank you
when i execution
For RGB stream:
python test_single_stream.py --batch_size 1 --n_classes 51 --model resnext --model_depth 101
--log 0 --dataset HMDB51 --modality RGB --sample_duration 16 --split 1 --only_RGB
--resume_path1 "trained_models/HMDB51/RGB_HMDB51_16f.pth"
--frame_dir "dataset/HMDB51"
--annotation_path "dataset/HMDB51_labels"
--result_path "results/"
For single stream MARS:
python test_single_stream.py --batch_size 1 --n_classes 51 --model resnext --model_depth 101
--log 0 --dataset HMDB51 --modality RGB --sample_duration 16 --split 1 --only_RGB
--resume_path1 "trained_models/HMDB51/MARS_HMDB51_16f.pth"
--frame_dir "dataset/HMDB51"
--annotation_path "dataset/HMDB51_labels"
--result_path "results/"
For two streams RGB+MARS:
python test_two_stream.py --batch_size 1 --n_classes 51 --model resnext --model_depth 101
--log 0 --dataset HMDB51 --modality RGB --sample_duration 16 --split 1 --only_RGB
--resume_path1 "trained_models/HMDB51/RGB_HMDB51_16f.pth"
--resume_path2 "trained_models/HMDB51/MARS_HMDB51_16f.pth"
--frame_dir "dataset/HMDB51"
--annotation_path "dataset/HMDB51_labels"
--result_path "results/"
the top 1 is lower 70%
Hi,
Line 96 in b706443
Am I missing anything?
Thank you very much
Thank you for publishing the code !
I have a question about flow, in the extract_frames_flows.py, line 95 to line 96:
cv2.imwrite(os.path.join(outdir, 'TVL1jpg_x_%05d.jpg' % (i)), iflow[:, :, 0])
cv2.imwrite(os.path.join(outdir, 'TVL1jpg_y_%05d.jpg' % (i)), iflow[:, :, 1])
so the shape of the TVL1jpg_x.jpg and TVL1jpg_y.jpg are both (256,256), while in the test_single_stream.py, line 35 to line 36:
if opt.modality=='RGB': opt.input_channels = 3
elif opt.modality=='Flow': opt.input_channels = 2
when I run the test_single_stream.py, it occured an error: Length of validation data = 0
I want know how it processes the two gray pictures to meet input_channels = 2.
when i train for flow stream i can't get the accuracy as you write in the paper ,only get 66.45%accuracy.why?
Hi, I would like to use the pre-trained weights on HMDB to infer the model on a single random video and get the actions from that video? Can you give me a little help with the steps required? What pre-processing does the video require?
Petru
Hi, thank you very much for publishing the code!
I think the data set code might be outdated, though. Here, it uses the variable, which is not defined:
Line 255 in ae2749d
There's an obvious fix for this, but I now I wonder if you might have published not the latest and greatest version of the code.
Furthermore, the code that parses annotation files seems to be wrong as well.
First, do you use **action detection annotations from here?
If this is the case, each line has the format:
or
In other words, there's no class ID for test entries, but there's a class ID for training.
Thus, the following two lines fail silently for training entries.
Line 264 in ae2749d
Could you share all of the finetune hyperparameters on hmdb51? For example, momentum, epoch, running rate ...
Thank you very much for a good Research.
Thank you very much for your work @craston. We really appreciate it.
I tested the model of 'RGB_HMDB_51_16f' in HMDB split 1. But I just got the accuracy of 55.7% which is 66.7% in your paper. The command is as follows:
python test_single_stream.py --batch_size 1 --n_classes 51 --model resnext --model_depth 101
--log 0 --dataset HMDB51 --modality RGB --sample_duration 16 --split 1 --only_RGB
--resume_path1 "../pretrained_model/RGB_HMDB51_16f.pth"
--frame_dir "../hmdb-51-1f"
--annotation_path "dataset/HMDB51_labels"
--result_path "results/"
--n_workers 4
Did I make a mistake somewhere? Thank you very much.
The speed to extract the flow map is too slow > _ <
Namespace(MARS=False, MARS_alpha=50.0, MARS_pretrain_path='', MARS_resume_path='', annotation_path='/media/cqq/Data/vicky/code/1action/MARS_dataset/dataset/HMDB51_labels/', batch_size=2, begin_epoch=1, checkpoint=1, dampening=0.9, dataset='HMDB51', frame_dir='/media/cqq/Data/vicky/code/1action/MARS_dataset/dataset/HMDB51_1/', freeze_BN=False, ft_begin_index=4, input_channels=3, learning_rate=0.1, log=1, lr_patience=10, manual_seed=1, modality='RGB_Flow', model='resnext', model_depth=101, momentum=0.9, n_classes=400, n_epochs=400, n_finetune_classes=51, n_workers=4, nesterov=False, only_RGB=False, optimizer='sgd', output_layers=["'avgpool'"], pretrain_path='/media/cqq/Data/vicky/code/1action/MARS/trained_models/MARS_Kinetics_64f.pth', random_seed=1, resnet_shortcut='B', resnext_cardinality=32, result_path='/media/cqq/Data/vicky/code/1action/MARS/results/', resume_path1='/media/cqq/Data/vicky/code/1action/MARS/trained_models/Flow_HMDB51_64f.pth', resume_path2='', resume_path3='', sample_duration=64, sample_size=112, split='1', training=True, weight_decay=0.001)
Preprocessing train data ...
Length of train data = 3570
Preprocessing validation data ...
Length of validation data = 1530
Preparing datatloaders ...
Length of train datatloader = 1785
Length of validation datatloader = 765
Loading MARS model... resnext 101
loading pretrained model /media/cqq/Data/vicky/code/1action/MARS/trained_models/MARS_Kinetics_64f.pth
Layers to finetune : ['layer4', 'fc']
Loading Flow model... resnext 101
loading checkpoint /media/cqq/Data/vicky/code/1action/MARS/trained_models/Flow_HMDB51_64f.pth
Initializing the optimizer ...
lr = 0.001 momentum = 0.9 dampening = 0.9 weight_decay = 1e-05, nesterov = False
LR patience = 10
run
Traceback (most recent call last):
File "/media/cqq/Data/vicky/code/1action/MARS/masr_train.py", line 186, in
outputs_Flow = model_Flow(inputs_Flow)[1].detach()
IndexError: list index out of range
The length of the model_Flow is 1, so the list index out of range how can i fix this?
Thansk
Thank you for your good effort.
I want to ask that for fine-tuning models(MARS or MERS) in smaller datasets such as HMDB-51, as a teacher network, did you utilize Flow network trained on Kinetics-400, or Flow network fine-tuned on HMDB-51 dataset?
Hi there .
To get the accuracy score and things like those did , did you draw any confusion matrix ?
If you did , can you share the script for UCF-101?
Thanks.
Hello, I want to kown why my validation accuracy is lower than 72.2%, it's about 71.3%. I set batch size as 125 and else didn't change. from now as I known, batch size maybe won't influence the result .Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.