Code Monkey home page Code Monkey logo

kinetics-downloader's Introduction

Download DeepMind's Kinetics

Download all videos from DeepMind's Kinetics dataset. Moreover, you can use this library to extract frames and sound track from videos, generate metadata for training and pack all sound tracks into a single tfrecords file for faster reading.

Requirements

  • Python >= 3.4
  • youtube-dl
  • ffmpeg
  • gzip

Required Python packages are listed in requirements.txt.

Usage

WARNING: Before you start any download from YouTube, please be sure, that you have checked YouTube Terms Of Service and you are compliant. Especially check section 5.H.

Download all videos:

python download.py --all

Download specific classes:

python download.py --classes 'pole vault' 'blowing glass'

List all classes:

python list_classes.py

Download specific categories:

Categories are defined as described in [1]. However, 14 classes were not present in any category, therefore, I added them under the category "custom".

python download.py --categories 'arts and crafts' cooking

List all categories and classes that belong to them:

python list_categories.py

Extract frames from videos:

Extracting frames from the video files is useful because loading mp4 files during training is time-consuming. Additionally, current Neural Networks are usually training on a small subset of video frames from each video making it wasteful to load the whole video.

python videos_to_frames.py --all

The script uses VideoCapture from the OpenCV library. If you installed the library using pip install opencv-python it will not work because video-related functionality is not supported in this build (see this stackoverflow question). You will need to build OpenCV with video-related functionality enabled to use this script.

Extract sound tracks from videos:

python videos_to_sound.py --all

Create metadata:

Although it would be ideal to have the whole dataset available, a fraction of videos has been delete from YouTube since Kinetics was released. Furthermore, not all videos contain a sound track and frame extraction might fail for some videos. For this reason, it is convenient to generate metadata that keep track of all successfully downloaded videos.

You can generate metadata for videos (.mp4 files), video frames or sound tracks. You will need to generate metadata for sound tracks if you want to pack them into a tfrecords file (see below).

Generate metadata for videos:

python create_meta.py videos --sets 400 --save resources/kinetics_videos

Generate metadata for video frames:

python create_meta.py frames --sets 400 --save resources/kinetics_video_frames

Generate metadata for sound tracks:

python create_meta.py sound --sets 400 --save resources/kinetics_sound

The --sets switch dictates how many classes will be included in the metadata. Example use case: we want to select the hyper-parameters of our neural networks on a small subset of Kinetics (let's say 50 from the 400 classes) and then train the neural network on the whole dataset. Therefore, we will call python create_meta.py frames --sets 50 400 --save resources/kinetics_video_frames to generate metadata for 50 randomly chosen classes and for all 400 classes.

Convert sound tracks into a tfrecords file:

Loading mp3 files in Tensorflow (as of version 1.3.0) creates a severe bottleneck for the training speed. It is convenient to pack all mp3 files into a single tfrecords file and load the tfrecords file during training.

Example:

python sound_to_tfrecords.py train resources/kinetics_sound_400_train.json resources/kinetics_sound_400_classes.json dataset/kinetics_400_train_sound.tfrecords

Note:

First, you will need to generate metadata for sound tracks.

Other scripts:

Download statistics (e.g. fraction of videos downloaded):

python download_stats.py

Video statistics (e.g. histogram of video resolutions):

python video_stats.py

Download structure

The training and validation videos are downloaded into their individual directories. Furthermore, a directory is created for each class.

Example:

dataset/train/blowing_glass
dataset/valid/blowing_glass

Test videos are all downloaded into a single directory because their classes are not known.

Example:

dataset/test

File names and video format

The videos are all download in mp4. If a video isn't available in mp4, it's downloaded in the next best format and converted into mp4. All videos are downloaded with sound.

Videos' file names correspond to their YouTube IDs. All spaces in directory names are replaced with underscores (e.g. blowing glass => blowing_glass).

Contributors

Acknowledgements

The sound to tfrecords script is based on this tutorial.

References

kinetics-downloader's People

Contributors

ondrejbiza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kinetics-downloader's Issues

Not all classes are being downloaded

Hi, thanks for your effort, but I got two issues, one is the fact that not all classes are being downloaded, for example for the 'a' classes I got only "arranging flowers", and he already started downloading 'b' classes.
Second I looked into "arranging_flowers" and 393 videos downloaded, while in the official site: "https://deepmind.com/research/open-source/open-source-datasets/kinetics/", there are 420 videos.

I'll be happy if you can help me understand what's wrong.
Thanks.

Entire dataset size

Thank you for providing this download tool, I successfully run the project, the download process can run normally.
Have you counted the size of the entire kinetics video dataset, I mean the size of the dataset before cropping.
I want to give a preliminary estimate of the time it takes to download. Although the size after cropping is 600GB, I found that many videos are much longer than 10 seconds.
Thank you ๏ผ

Continue download

Hi,

I'm downloading the kinetics using this repo. However, my computer (running windows 10) suddenly restarted due to an update, which paused the script. If I run the script again using the same command, will it recognize the already downloaded files?

Nothing downloaded

No error shows up. But, not even one video is downloaded.
Please help me on this issue.
Maybe it's about the environment?
FYI, I use pip to install ffmpeg, youtube-dl,and gzip-reader. My python is 3.6.3.
Look forward to your help.
Thanks.

kinetics-600

Can you please add support for kinetics-600? The resources/classes and categories need to be updated. I was wondering if you had a script that automatically generated these files.

Download stops after 1 file with a No such file or directory: 'ffmpeg' error

Hi
I am trying to download with both all and specific class with the command

python download.py --classes 'whistling'

But download stops after downloading 1st file and during downloading 2nd file with the following error. I have installed ffmpeg in the environment correctly. I am using Ubuntu 18.04, python 3.8.3, anaconda 1.7.2

Process Process-2:
Traceback (most recent call last):
File "/home/uniwa/students3/students/22905553/linux/anaconda3/envs/kinetics/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/uniwa/students3/students/22905553/linux/anaconda3/envs/kinetics/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/uniwa/students3/students/22905553/linux/phd_codes/kinetics-downloader/lib/parallel_download.py", line 121, in video_worker
if not downloader.process_video(video_id, directory, start, end, compress=compress, log_file=log_file):
File "/home/uniwa/students3/students/22905553/linux/phd_codes/kinetics-downloader/lib/downloader.py", line 93, in process_video
success = cut_video(download_path, slice_path, start, end)
File "/home/uniwa/students3/students/22905553/linux/phd_codes/kinetics-downloader/lib/downloader.py", line 38, in cut_video
return_code = subprocess.call(["ffmpeg", "-loglevel", "quiet", "-i", raw_video_path, "-strict", "-2",
File "/home/uniwa/students3/students/22905553/linux/anaconda3/envs/kinetics/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/uniwa/students3/students/22905553/linux/anaconda3/envs/kinetics/lib/python3.8/subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/uniwa/students3/students/22905553/linux/anaconda3/envs/kinetics/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Downloading each class separably to speed up

Hi, in order to speed up the downloads of the dataset (if I won't it will take something like 400 days), I'm trying to download each class separately, but I'm facing a problem that some classes are not downloading.
I mean "bartending" is not downloading but "cleaning toilet" does...
Do you have any Idea why this should happen?
Just to make it clear, I'm using 2 different computer to download to 1 disk. so each computer downloading 2-3 classes parallel, maybe something with the parallelization mess it up?

Thanks.

Can't download certain class , am I doing something wrong?

Hi, I have a small problem to download class by class.
When I'm doing
python download.py --all , It's downloading.
python download.py --classes "bartending" also downloading.
But , python download.py --classes "washing feet" not downloading
Also python download.py --classes 'jumping into pool' not downloading
Tried also "washing_feet", still not downloading.

Can you identify what am I doing wrong?

Thanks.

Time and space to download dataset

thank you for releasing this awesome repo! How long will it take to download the entire dataset? I know it'll depend on network connection, but I'm just looking for a ball park figure here.

Download is extremely slow

Hi,
I have started to use the tool and in about 20 hours it has downloaded ~14GB, using 20 threads. I have a pretty good internet connection.
I was wondering what is wrong. Is there any way to speed up the download?
Another question: If I stop and restart, will it downloaded existing files again?

HTTP Error 429: Too Many Requests

Hi Ondrej,
Thanks for your work with the kinetics-downloader.
I began downloading the whole dataset but after downloading ~ 5.6 GB of data/videos I began getting the following error message:
HTTP Error 429: Too Many Requests

I understand that the primary reason for this error is caused by YouTube throttling requests (blacklisting the requesting ip address).

Do you know about any automatised workaround for this problem?

Best Regards,
Luis Amezcua

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.