Code Monkey home page Code Monkey logo

c1-action-recognition-tsn-trn-tsm's Introduction

CircleCI GitHub release arXiv-1804.02748 Data

EPIC-KITCHENS-55 is the largest dataset in first-person (egocentric) vision; 55 hours of multi-faceted, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel `live' audio commentary approach.

Authors

Dima Damen (1) Hazel Doughty (1) Giovanni Maria Farinella (3) Sanja Fidler (2) Antonino Furnari (3) Evangelos Kazakos (1) Davide Moltisanti (1) Jonathan Munro (1) Toby Perrett (1) Will Price (1) Michael Wray (1)

  • (1 University of Bristol)
  • (2 University of Toronto)
  • (3 University of Catania)

Contact: [email protected]

Citing

When using the dataset, kindly reference:

@INPROCEEDINGS{Damen2018EPICKITCHENS,
   title={Scaling Egocentric Vision: The EPIC-KITCHENS Dataset},
   author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria  and Fidler, Sanja and 
           Furnari, Antonino and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan 
           and Perrett, Toby and Price, Will and Wray, Michael},
   booktitle={European Conference on Computer Vision (ECCV)},
   year={2018}
} 

(Check publication here)

Dataset Details

Ground Truth

We provide ground truth for action segments and object bounding boxes.

  • Objects: Full bounding boxes of narrated objects for every annotated frame.
  • Actions: Split into narrations and action labels:
    • Narrations containing the narrated sentence with the timestamp.
    • Action labels containing the verb and noun labels along with the start and end times of the segment.

Dataset Splits

The dataset is comprised of three splits with the corresponding ground truth:

  • Training set - Full ground truth.
  • Seen Kitchens (S1) Test set - Start/end times only.
  • Unseen Kitchens (S2) Test set - Start/end times only.

Initially we are only releasing the full ground truth for the training set in order to run action and object challenges.

Important Files

Additional Files

We direct the reader to RDSF for the videos and rgb/flow frames.

We provide html and pdf alternatives to this README which are auto-generated.

Files Structure

EPIC_train_action_labels.csv

CSV file containing 14 columns:

Column Name Type Example Description
uid int 6374 Unique ID of the segment.
video_id string P03_01 Video the segment is in.
narration string close fridge English description of the action provided by the participant.
start_timestamp string 00:23:43.847 Start time in HH:mm:ss.SSS of the action.
stop_timestamp string 00:23:47.212 End time in HH:mm:ss.SSS of the action.
start_frame int 85430 Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frame int 85643 End frame of the action (WARNING only for frames extracted as detailed in Video Information).
participant_id string P03 ID of the participant.
verb string close Parsed verb from the narration.
noun string fridge First parsed noun from the narration.
verb_class int 3 Numeric ID of the parsed verb's class.
noun_class int 10 Numeric ID of the parsed noun's class.
all_nouns list of string (1 or more) ['fridge'] List of all parsed nouns from the narration.
all_noun_classes list of int (1 or more) [10] List of numeric IDs corresponding to all of the parsed nouns' classes from the narration.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_train_invalid_labels.csv

CSV file containing 14 columns:

Column Name Type Example Description
uid int 6374 Unique ID of the segment.
video_id string P03_01 Video the segment is in.
narration string close fridge English description of the action provided by the participant.
start_timestamp string 00:23:43.847 Start time in HH:mm:ss.SSS of the action.
stop_timestamp string 00:23:47.212 End time in HH:mm:ss.SSS of the action.
start_frame int 85430 Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frame int 85643 End frame of the action (WARNING only for frames extracted as detailed in Video Information).
participant_id string P03 ID of the participant.
verb string close Parsed verb from the narration.
noun string fridge First parsed noun from the narration.
verb_class int 3 Numeric ID of the parsed verb's class.
noun_class int 10 Numeric ID of the parsed noun's class.
all_nouns list of string (1 or more) ['fridge'] List of all parsed nouns from the narration.
all_noun_classes list of int (1 or more) [10] List of numeric IDs corresponding to all of the parsed nouns' classes from the narration.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_train_action_narrations.csv

CSV file containing 5 columns:

Note: The start/end timestamp refers to the start/end time of the narration, not the action itself.

Column Name Type Example Description
participant_id string P03 ID of the participant.
video_id string P03_01 Video the segment is in.
start_timestamp string 00:23:43.847 Start time in HH:mm:ss.SSS of the narration.
stop_timestamp string 00:23:47.212 End time in HH:mm:ss.SSS of the narration.
narration string close fridge English description of the action provided by the participant.

EPIC_train_object_labels.csv

CSV file containing 6 columns:

Column Name Type Example Description
noun_class int 20 Integer value representing the class in noun-classes.csv.
noun string bag Original string name for the object.
participant_id string P01 ID of participant.
video_id string P01_01 Video the object was annotated in.
frame int 056581 Frame number of the annotated object.
bounding_boxes list of 4-tuple (0 or more) "[(76, 1260, 462, 186)]" Annotated boxes with format (<top:int>,<left:int>,<height:int>,<width:int>).

EPIC_train_object_action_correspondence.csv

CSV file containing 5 columns:

Column Name Type Example Description
participant_id string P01 ID of participant.
video_id string P01_01 Video the frames are part of.
object_frame int 56581 Frame number of the object detection image from object_detection_images.
action_frame int 56638 Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestamp string 00:00:00.00 Timestamp in HH:mm:ss.SS corresponding to the frame.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s1_object_action_correspondence.csv

CSV file containing 5 columns:

Column Name Type Example Description
participant_id string P01 ID of participant.
video_id string P01_11 Video containing the object s1 test frames.
object_frame int 33601 Frame number of the object detection image from object_detection_images.
action_frame int 33635 Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestamp string 00:09:20.58 Timestamp in HH:mm:ss.SS corresponding to the frames.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s2_object_action_correspondence.csv

CSV file containing 5 columns:

Column Name Type Example Description
participant_id string P09 ID of participant.
video_id string P09_05 Video containing the object s2 test frames.
object_frame int 15991 Frame number of the object detection image from object_detection_images.
action_frame int 16007 Frame number of the corresponding image in the released frames for action recognition in frames_rgb_flow.
timestamp string 00:04:26.78 Timestamp in HH:mm:ss.SS corresponding to the frames.

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s1_object_video_list.csv

CSV file listing the videos used to obtain the object s1 test frames. The frames can be obtained from RDSF under object_detection_images/test. Please test all frames from this folder for the videos listed in this csv.

Column Name Type Example Description
video_id string P01_11 Video containing the object s1 test frames.
participant_id string P01 ID of the participant.

EPIC_test_s2_object_video_list.csv

CSV file listing the videos used to obtain the object s2 test frames. The frames can be obtained from RDSF under object_detection_images/test. Please test all frames from this folder for the videos listed in this csv.

Column Name Type Example Description
video_id string P01_11 Video containing the object s2 test frames.
participant_id string P01 ID of the participant.

EPIC_test_s1_timestamps.csv

CSV file containing 7 columns:

Column Name Type Example Description
uid int 1924 Unique ID of the segment.
participant_id string P01 ID of the participant.
video_id string P01_11 Video the segment is in.
start_timestamp string 00:00:00.000 Start time in HH:mm:ss.SSS of the action.
stop_timestamp string 00:00:01.890 End time in HH:mm:ss.SSS of the action.
start_frame int 1 Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frame int 93 End frame of the action (WARNING only for frames extracted as detailed in Video Information).

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_test_s2_timestamps.csv

CSV file containing 7 columns:

Column Name Type Example Description
uid int 15582 Unique ID of the segment.
participant_id string P09 ID of the participant.
video_id string P09_01 Video the segment is in.
start_timestamp string 00:00:01.970 Start time in HH:mm:ss.SSS of the action.
stop_timestamp string 00:00:03.090 End time in HH:mm:ss.SSS of the action.
start_frame int 118 Start frame of the action (WARNING only for frames extracted as detailed in Video Information).
stop_frame int 185 End frame of the action (WARNING only for frames extracted as detailed in Video Information).

Please note we have included a python pickle file for ease of use. This includes a pandas dataframe with the same layout as above. This pickle file was created with pickle protocol 2 on pandas version 0.22.0.

EPIC_noun_classes.csv

CSV file containing 3 columns:

Note: a colon represents a compound noun with the more generic noun first. So pan:dust should be read as dust pan.

Column Name Type Example Description
noun_id int 2 ID of the noun class.
class_key string pan:dust Key of the noun class.
nouns list of string (1 or more) "['pan:dust', 'dustpan']" All nouns within the class (includes the key).

EPIC_verb_classes.csv

CSV file containing 3 columns:

Column Name Type Example Description
verb_id int 3 ID of the verb class.
class_key string close Key of the verb class.
verbs list of string (1 or more) "['close', 'close-off', 'shut']" All verbs within the class (includes the key).

EPIC_descriptions.csv

CSV file containing 4 columns:

Column Name Type Example Description
video_id string P01_01 ID of the video.
date string 30/04/2017 Date on which the video was shot.
time string 13:49:00 Local recording time of the video.
description string prepared breakfast with soy milk and cereals Description of the activities contained in the video.

EPIC_many_shot_verbs.csv

CSV file containing the many shot verbs. A verb class is considered many shot if it appears more than 100 times in training. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column Name Type Example Description
verb_class int 1 Numeric ID of the verb class
verb string put Verb corresponding to the verb class

EPIC_many_shot_nouns.csv

CSV file containing the many shot nouns. A noun class is considered many shot if it appears more than 100 times in training. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column Name Type Example Description
noun_class int 3 Numeric ID of the noun class
noun string tap Noun corresponding to the noun class

EPIC_many_shot_actions.csv

CSV file containing the many shot actions. An action class (composed of a verb class and noun class) is considered many shot if BOTH the verb class and noun class are many shot AND the action class appears in training at least once. (NOTE: this file is derived from EPIC_train_action_labels.csv, checkout the accompanying notebook demonstrating how we compute these classes)

Column Name Type Example Description
action_class (int, int) (9, 84) Numeric Pair of IDs, first the verb, then the noun
verb_class int 9 Numeric ID of the verb class
verb string move Verb corresponding to the verb class
noun_class int 84 Numeric ID of the noun class
noun string sausage Noun corresponding to the noun class

EPIC_video_info.csv

CSV file containing information for each video

Column Name Type Example Description
video (string) P01_01 Video ID
resolution (string) 1920x1080 Resolution of the video, format is WIDTHxHEIGHT
duration (float) 1652.152817 Duration of the video, in seconds
fps (float) 59.9400599400599 Frame rate of the video

File Downloads

Due to the size of the dataset we provide scripts for downloading parts of the dataset:

Note: These scripts will work for Linux and Mac. For Windows users a bash installation should work.

These scripts replicate the folder structure of the dataset release, found here.

If you wish to download part of the dataset instructions can be found here.

Video Information

Videos are recorded in 1080p at 59.94 FPS on a GoPro Hero 5 with linear field of view. There are a minority of videos which were shot at different resolutions, field of views, or FPS due to participant error or camera. These videos identified using ffprobe are:

  • 1280x720: P12_01, P12_02, P12_03, P12_04.
  • 2560x1440: P12_05, P12_06
  • 29.97 FPS: P09_07, P09_08, P10_01, P10_04, P11_01, P18_02, P18_03
  • 48 FPS: P17_01, P17_02, P17_03, P17_04
  • 90 FPS: P18_09

The GoPro Hero 5 was also set to drop the framerate in low light conditions to preserve exposure leading to variable FPS in some videos. If you wish to extract frames we suggest you resample at 60 FPS to mitigate issues with variable FPS, this can be achieved in a single step with FFmpeg:

ffmpeg -i "P##_**.MP4" -vf "scale=-2:256" -q:v 4 -r 60 "P##_**/frame_%010d.jpg"

where ## is the Participant ID and ** is the video ID.

Optical flow was extracted using a fork of gpu_flow made available on github. We set the parameters: stride = 2, dilation = 3, bound = 25 and size = 256.

License

All files in this dataset are copyright by us and published under the Creative Commons Attribution-NonCommerial 4.0 International License, found here. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.

Disclaimer

EPIC-KITCHENS-55 and EPIC-KITCHENS-100 were collected as a tool for research in computer vision, however, it is worth noting that the dataset may have unintended biases (including those of a societal, gender or racial nature).

Changelog

See release history for changelog.

c1-action-recognition-tsn-trn-tsm's People

Contributors

dwhettam avatar sawyermade avatar willprice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c1-action-recognition-tsn-trn-tsm's Issues

How can I change the frame rate for training and evaluation?

Thanks for the datasets and pytorch codes. I have a question about how to change the frame rate during training/evaluation.
I am looking at tsn_rgb.yaml right now. Which attribute in this config file controls the frame rate? For example, if I want to lower the frame rate for action recognition (predict based on fewer frames for a segment), how should I modify the yaml file or python code?

Pickle EpicVideoDataset object for distributed processing training

Dear Will,

Is there a way to pickle the EpicVideoDataset object?
I was trying to use that class with a code using Distributed Data Parallel, multiprocessing. I got an error along the lines of Can't pickle object EpicVideoDataset.

Is that related to the GULPReader? Do you know a workaround for that?

Any thoughts or pointer will be relevant.

Thanks!

Download link for TSM features not working

Hi!

I'm trying to download the RGB features extracted with TSM, but the link seems broken.
Do you have an alternative one or should I just wait?

Thanks in advance.

Kind regards,
Alessandro

mean and std have the same values in the checkpoints RGB

Hi,
I downloaded the TSN (RGB) checkpoint to test it. After looking at the configuration attributes I saw that the mean and the std values for data pre-processing were actually the same. It is on purpose ?

If I do:
ckpt = torch.load("path/to/tosn_rgb.ckpt", map_location=lambda storage, loc: storage)
cfg = OmegaConf.create(ckpt["hyper_parameters"])
OmegaConf.set_struct(cfg, False)
cfg.data._root_gulp_dir = os.getcwd()
print(cfg.data.preprocessing.mean)
print(cfg.data.preprocessing.std)

Then I have:
'mean': [0.485, 0.456, 0.406]
'std': [0.485, 0.456, 0.406]

Actually after looking at the checkpoints of TRN and TMN I saw that they also have this problem.
Thanks.

Error launching training

Hi,

I got this error trying to launch the training. Could you please provide a pointer on how to debug that?

  • I'm new to PytorchLighting. It seems that it modularizes the code too much.

  • I couldn't reach the function or class returning the wrong data type.

Details

The output of my shell

$ ls _root_gulp_dir
flow_test  flow_validation  flow_train  rgb_test  rgb_validation  rgb_train

BTW, I already visualize the data

Random crash when num_workers is larger than 0

Thanks for sharing these great resources. I tried to run the code on our server, but it randomly crashed when I set the num_workers to be larger than 0 (sometimes, it crashed at epoch 1, while other times it crashed after 6 epochs). When num_workers is 0, it didn't crash but it was extremely slow.

The error messages are like these:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: initialization error
Exception raised from insert_events at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f19cd7288b2 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f19cd97af20 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f19cd713b7d in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5f9e52 (0x7f1a4c9dfe52 in /mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/utils.py", line 356, in <lambda>
    lambda: hydra.run(
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/hydra/core/utils.py", line 125, in run_job
    ret.return_value = task_function(task_cfg)
  File "src/train.py", line 53, in main
    trainer.fit(system, datamodule=data_module)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
    results = self.accelerator_backend.train()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
    return self.train_or_test()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
    results = self.trainer.train()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train
    self.train_loop.run_training_epoch()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 542, in run_training_epoch
    for batch_idx, (batch, is_last_batch) in train_dataloader:
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/profiler/profilers.py", line 85, in profile_iterable
    value = next(iterator)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 46, in _with_is_last
    last = next(it)
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
    idx, data = self._get_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data
    success, data = self._try_get_data()
  File "/mnt/miniconda/envs/epic-models/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 25051) exited unexpectedly
  

Could someone provide some guidance on how to get around this error?

Sampling strategy for training TSM architechture

Hi!

First, thank you for the nice repository, really helpful. I have a question regarding the sampling strategy you used to train TSM architecture using just RGB frames, the one from the Pretrained Models table.

From the config file, I see that you use 8 frames. However, I have been checking your EPIC Kitchens paper, and also the original TSM paper, and I have not been able to find how these 8 frames are sampled from the complete video sequence for a given action.

  • Are those frames consecutive?
  • Are uniformly sampled from the complete video sequence? eg. if the sequence has 300 frames, we select frames [0, 43, 86, 129, 171, 214, 257, 300]
  • Any other sampling strategy?

Thank you!

Alex.

src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning

When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?

~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)

Converting RGB idx to Flow Pandas SettingWithCopyWarning

When running the src/convert_rgb_to_flow_frame_idxs.py I get a Pandas error and am not sure if it is actually causing a probelm?

~/GIT/C1-Action-Recognition-TSN-TRN-TSM(master*) » ./run_flow_convert.sh smc@x86_64-conda-linux-gnu
src/convert_rgb_to_flow_frame_idxs.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
year_df[col] = convert_rgb_frame_to_flow_frame_idx(year_df[col], stride)
(epic100-models)

Confirm results of pretrained models

Hi!

I was testing the pre-trained model, TSM RGB, and I got odd results in the validation set.

For action@1, I got 28.23 while you reported 35.75

all_action_accuracy_at_1: 28.237484484898633
all_action_accuracy_at_5: 47.6934215970211
all_noun_accuracy_at_1: 39.68762929251138
all_noun_accuracy_at_5: 65.98055440628879
all_verb_accuracy_at_1: 57.03351261894911
all_verb_accuracy_at_5: 86.38808440215143
tail_action_accuracy_at_1: 12.045088566827697
tail_noun_accuracy_at_1: 20.157894736842106
tail_verb_accuracy_at_1: 28.40909090909091

commit: d58e695

Steps

  • I generated the results in the validation set with this repo
  • Then, I evaluate those with the corresponding code.

Action models always giving same output

I am trying to test the models on a personal egocentric dataset. Instead of creating a video dataset, I extract frames from the videos and stack them together (tested with frame_count 8 and 25) and feed it to model (TSN and TSM).
This is my code below :
`folder = '/data/sample/'
transforms = transforms.Compose([transforms.CenterCrop(224),
transforms.ToTensor()])

dict = torch.load('/data/tsn_rgb.ckpt', map_location="cpu")
cfg =  OmegaConf.create(dict["hyper_parameters"])
OmegaConf.set_struct(cfg, False)

cfg.data._root_gulp_dir = os.getcwd()  # set default root gulp dir to prevent
# exceptions on instantiating the EpicActionRecognitionSystem
data_dir_key = f"test_gulp_dir"
cfg.data[data_dir_key] = folder
cfg.trainer.accelerator = None

system = EpicActionRecognitionSystem(cfg)
system.load_state_dict(dict["state_dict"])

Img = None
for i in range(1, 26):
    img = transforms(Image.open(folder + 'img_' + str(i) + '.jpg')).unsqueeze(dim=0)
    if i > 1 :
        Img = torch.cat((Img, img), dim=0)
    else:
        Img = img
#
Img = Img.unsqueeze(dim=0)
print(Img.shape) # torch.Size([1, 25, 3, 224, 224])
print(system)
out = system(Img)
print(out.shape)
v, n = out[:, :97], out[:, 97:]
print(v.shape, n.shape)
print(torch.mean(v), torch.mean(n))`

This is my config :
{'modality': 'RGB', 'seed': 42, 'data': {'frame_count': 8, 'test_frame_count': 25, 'segment_length': 1, 'train_gulp_dir': '${data._root_gulp_dir}/rgb_train', 'val_gulp_dir': '${data._root_gulp_dir}/rgb_validation', 'test_gulp_dir': '/data/sample/', 'worker_count': 40, 'pin_memory': True, 'preprocessing': {'bgr': False, 'rescale': True, 'input_size': 224, 'scale_size': 256, 'mean': [0.485, 0.456, 0.406], 'std': [0.485, 0.456, 0.406]}, 'train_augmentation': {'multiscale_crop_scales': [1, 0.875, 0.75, 0.66]}, 'test_augmentation': {'rescale_size': 256}, '_root_gulp_dir': '/home/sanketthakur/Documents/gaze_pred/C1-Action-Recognition-TSN-TRN-TSM'}, 'model': {'type': 'TSN', 'backbone': 'resnet50', 'pretrained': 'imagenet', 'dropout': 0.7, 'partial_bn': True}, 'learning': {'batch_size': 4, 'optimizer': {'type': 'SGD', 'momentum': 0.9, 'weight_decay': 0.0005}, 'lr': 0.01, 'lr_scheduler': {'type': 'StepLR', 'gamma': 0.1, 'epochs': [20, 40]}}, 'trainer': {'gradient_clip_val': 20, 'max_epochs': 80, 'weights_summary': 'full', 'benchmark': True, 'terminate_on_nan': True, 'distributed_backend': 'dp', 'gpus': 0, 'accumulate_grad_batches': 2, 'accelerator': None}}

The network always predicts verb_id as 0 and noun_id as 1. I am not sure, if I am doing something wrong here. Any help is appreciated.
Thanks.

gulpio.utils.ImageNotFound

When running the src/gulp_data.py i always get an Error on the same Frame and it stops.

raise ImageNotFound("Image is  None from path:{}".format(img_path))
gulpio.utils.ImageNotFound: Image is  None from path:/run/media/local_admin/ESMI MD II/EPIC-KITCHENS/P01/rgb_frames/P01/P01_01/frame_0000000008.jpg

Minor inconsistency in the gulp adaptor.

Hi, I found that in the RGB gulp adaptor, the stop_frame is inclusive.

for idx in range(meta["start_frame"], meta["stop_frame"] + 1)

whereas in the flow adaptor, the stop_frame is excluded.

for idx in range(start_frame, stop_frame)

I know that it makes little difference, but just wanted to point it out because I found it when I was adapting the code to my training pipeline and I had an error trying to read one more frame.

How to test a pretrained model

I downloaded pretrained models and loaded them from check points in models folder

python src/test.py \
    models/trn_rgb.ckpt \
    results/trn_rgb.pt \
    --split val 

But I get this error TypeError: init() got an unexpected keyword argument 'row_log_interval'

How should I load a pretrained models and test it?

Thanks

Switch to gulpio2

I've forked gulpio to gulpio2 to use faster JPEG decoding from simplejpeg. It should be a drop in replacement, but we need to check that gulping the full dataset still works before merging this change in

Reproducing PIL-SIMD

Hi,

I was wondering if anyone has reproduced the PIL-SIMD patching of the conda environment recently?

Thanks!

Horizontal flip not returning flipped images?

class GroupRandomHorizontalFlip:
"""Randomly horizontally flips the given PIL.Image with a probability of 0.5"""
def __init__(self, is_flow=False):
self.is_flow = is_flow
@profile
def __call__(self, img_group, is_flow=False):
v = random.random()
if v < 0.5:
ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
if self.is_flow:
for i in range(0, len(ret), 2):
ret[i] = ImageOps.invert(
ret[i]
) # invert flow pixel values when flipping
return img_group

It looks like ret is never returned, but instead the original images are returned. Can you check if the code may be not applying flip at all?

Additionally, is_flow in the __call__ method seems to be never used.

Is the accuracy based on each batch or the entire val dataset?

Thanks for sharing this great repo. I am looking at https://github.com/epic-kitchens/C1-Action-Recognition-TSN-TRN-TSM/blob/master/src/systems.py#L265, and I am trying to understand for the accuracy values were obtained. According to line 265, it seems the accuracy values are calculated based on only single batch, instead of the entire validation dataset (I didn't find accumulation operation, either). Can you confirm if acc_1 and acc_5 are for single batch or the entire validation dataset?

Enable ipdb via config

Does anyone want a patch to enable or disable ipdb in train.py from the config file?

Here you go:

from contextlib import nullcontext
context_manager = nullcontext
    if cfg.debug:
        import ipdb
        context_manager = ipdb.ipdb.launch_ipdb_on_exception

    with context_manager():

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.