minghchen / carl_code Goto Github PK
View Code? Open in Web Editor NEWPytorch code for Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning, CVPR2022.
License: MIT License
Pytorch code for Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning, CVPR2022.
License: MIT License
Congratulations, you've done a fantasitic job. I try to reproduce your work, but find some problems.
First, i noticed that the parameter DATA.SAMPLING_REGION in *_config.yml refer to alpha variant in your paper., and block_size
in dataset class's sample_frame
function means sample size like Pouring. Should it be the production of expand_ratio
and num_frames
rather than expand_ratio
and seq_len
like
def sample_frames(self, seq_len, num_frames, pre_steps=None):
...
elif sampling_strategy == 'time_augment':
num_valid = min(seq_len, num_frames)
expand_ratio = np.random.uniform(low=1.0, high=self.cfg.DATA.SAMPLING_REGION) if self.cfg.DATA.SAMPLING_REGION>1 else 1.0
# block_size = math.ceil(expand_ratio*seq_len)
block_size = math.ceil(expand_ratio*num_frames)
Second, i run training process in Pouring Dataset but only got 90.3% classification accuracy. It's less than 93.73% reported in your paper. Can you give some advice ? I use batch_size = 4, embedde_model.num_layers = 3, other parameter is the same with scl_transformer_config in your repo.
Hello, thank you for sharing your work.
I wanted to use the model trained with the CARL method to obtain an embedding for singular images (not part of a video).
Would it fine if I input the image with size (batch size, T, 3, 224, 224) with T = 1 to the model?
For a video input of size (batch size, T, 3, 224, 224) with T > 1, I observed a difference between embedding obtained from inputing the video at once or inputing every frame of the video one by one.
Hello, if possible, could the authors provide recommended hardware (CPU/GPU counts) and expected runtimes for the jobs described in the "Training" section? I am attempting to run these jobs and data loading seems to be a major bottleneck for my runtime. For example, training one epoch on Penn Action is taking about 30 minutes each. Is this normal?
Hi, I've been experimenting with your code and I've noticed that some of the metrics (specifically event_completion and retrieval) sometime peak in earlier epochs and then fall in later epochs. I am training 300 epochs and evaluating every 50 epochs as is the default for the configs.
I just wanted to check: for your results in "Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning" do you report the metrics for the "best" intermediate epoch per metric, or do you always report the results for the final epoch (300)? I searched over the paper and I could not find an explicit answer to this, so I wanted to ask here. Thanks!
torch.fx is released over pytorch 1.6.0, which is used in this project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.