dmlc / decord Goto Github PK

An efficient video loader for deep learning with smart shuffling that's super easy to digest

License: Apache License 2.0

C++ 62.39% C 6.38% CMake 3.33% Python 23.43% Cuda 0.49% Batchfile 0.09% Shell 0.69% Dockerfile 0.19% Cython 3.02%

video-loader

decord's Introduction

Decord

Decord is a reverse procedure of Record. It provides convenient video slicing methods based on a thin wrapper on top of hardware accelerated video decoders, e.g.

FFMPEG/LibAV(Done)
Nvidia Codecs(Done)
Intel Codecs

Decord was designed to handle awkward video shuffling experience in order to provide smooth experiences similar to random image loader for deep learning.

Decord is also able to decode audio from both video and audio files. One can slice video and audio together to get a synchronized result; hence providing a one-stop solution for both video and audio decoding.

Benchmark
Installation
Usage
Bridge for Deep Learning frameworks

Preliminary benchmark

Decord is good at handling random access patterns, which is rather common during neural network training.

Installation

Install via pip

Simply use

pip install decord

Supported platforms:

Linux
Mac OS >= 10.12, python>=3.5
Windows

Note that only CPU versions are provided with PYPI now. Please build from source to enable GPU acclerator.

Install from source

Linux

Install the system packages for building the shared library, for Debian/Ubuntu users, run:

# official PPA comes with ffmpeg 2.8, which lacks tons of features, we use ffmpeg 4.0 here
sudo add-apt-repository ppa:jonathonf/ffmpeg-4 # for ubuntu20.04 official PPA is already version 4.2, you may skip this step
sudo apt-get update
sudo apt-get install -y build-essential python3-dev python3-setuptools make cmake
sudo apt-get install -y ffmpeg libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev
# note: make sure you have cmake 3.8 or later, you can install from cmake official website if it's too old

Clone the repo recursively(important)

git clone --recursive https://github.com/dmlc/decord

Build the shared library in source root directory:

cd decord
mkdir build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make

you can specify -DUSE_CUDA=ON or -DUSE_CUDA=/path/to/cuda or -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER=/path/to/cuda/nvcc to enable NVDEC hardware accelerated decoding:

cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release

Note that if you encountered the an issue with libnvcuvid.so (e.g., see #102), it's probably due to the missing link for libnvcuvid.so, you can manually find it (ldconfig -p | grep libnvcuvid) and link the library to CUDA_TOOLKIT_ROOT_DIR\lib64 to allow decord smoothly detect and link the correct library.

To specify a customized FFMPEG library path, use `-DFFMPEG_DIR=/path/to/ffmpeg".

Install python bindings:

cd ../python
# option 1: add python path to $PYTHONPATH, you will need to install numpy separately
pwd=$PWD
echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bashrc
source ~/.bashrc
# option 2: install with setuptools
python3 setup.py install --user

Mac OS

Installation on macOS is similar to Linux. But macOS users need to install building tools like clang, GNU Make, cmake first.

Tools like clang and GNU Make are packaged in Command Line Tools for macOS. To install:

xcode-select --install

To install other needed packages like cmake, we recommend first installing Homebrew, which is a popular package manager for macOS. Detailed instructions can be found on its homepage.

After installation of Homebrew, install cmake and ffmpeg by:

brew install cmake ffmpeg
# note: make sure you have cmake 3.8 or later, you can install from cmake official website if it's too old

Clone the repo recursively(important)

git clone --recursive https://github.com/dmlc/decord

Then go to root directory build shared library:

cd decord
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

Install python bindings:

cd ../python
# option 1: add python path to $PYTHONPATH, you will need to install numpy separately
pwd=$PWD
echo "PYTHONPATH=$PYTHONPATH:$pwd" >> ~/.bash_profile
source ~/.bash_profile
# option 2: install with setuptools
python3 setup.py install --user

Windows

For windows, you will need CMake and Visual Studio for C++ compilation.

First, install git, cmake, ffmpeg and python. You can use Chocolatey to manage packages similar to Linux/Mac OS.
Second, install Visual Studio 2017 Community, this my take some time.

When dependencies are ready, open command line prompt:

cd your-workspace
git clone --recursive https://github.com/dmlc/decord
cd decord
mkdir build
cd build
cmake -DCMAKE_CXX_FLAGS="/DDECORD_EXPORTS" -DCMAKE_CONFIGURATION_TYPES="Release" -G "Visual Studio 15 2017 Win64" ..
# open `decord.sln` and build project

Usage

Decord provides minimal API set for bootstraping. You can also check out jupyter notebook examples.

VideoReader

VideoReader is used to access frames directly from video files.

from decord import VideoReader
from decord import cpu, gpu

vr = VideoReader('examples/flipping_a_pancake.mkv', ctx=cpu(0))
# a file like object works as well, for in-memory decoding
with open('examples/flipping_a_pancake.mkv', 'rb') as f:
  vr = VideoReader(f, ctx=cpu(0))
print('video frames:', len(vr))
# 1. the simplest way is to directly access frames
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    frame = vr[i]
    print(frame.shape)

# To get multiple frames at once, use get_batch
# this is the efficient way to obtain a long list of frames
frames = vr.get_batch([1, 3, 5, 7, 9])
print(frames.shape)
# (5, 240, 320, 3)
# duplicate frame indices will be accepted and handled internally to avoid duplicate decoding
frames2 = vr.get_batch([1, 2, 3, 2, 3, 4, 3, 4, 5]).asnumpy()
print(frames2.shape)
# (9, 240, 320, 3)

# 2. you can do cv2 style reading as well
# skip 100 frames
vr.skip_frames(100)
# seek to start
vr.seek(0)
batch = vr.next()
print('frame shape:', batch.shape)
print('numpy frames:', batch.asnumpy())

VideoLoader

VideoLoader is designed for training deep learning models with tons of video files. It provides smart video shuffle techniques in order to provide high random access performance (We know that seeking in video is super slow and redundant). The optimizations are underlying in the C++ code, which are invisible to user.

from decord import VideoLoader
from decord import cpu, gpu

vl = VideoLoader(['1.mp4', '2.avi', '3.mpeg'], ctx=[cpu(0)], shape=(2, 320, 240, 3), interval=1, skip=5, shuffle=1)
print('Total batches:', len(vl))

for batch in vl:
    print(batch[0].shape)

Shuffling video can be tricky, thus we provide various modes:

shuffle = -1  # smart shuffle mode, based on video properties, (not implemented yet)
shuffle = 0  # all sequential, no seeking, following initial filename order
shuffle = 1  # random filename order, no random access for each video, very efficient
shuffle = 2  # random order
shuffle = 3  # random frame access in each video only

AudioReader

AudioReader is used to access samples directly from both video(if there's an audio track) and audio files.

from decord import AudioReader
from decord import cpu, gpu

# You can specify the desired sample rate and channel layout
# For channels there are two options: default to the original layout or mono
ar = AudioReader('example.mp3', ctx=cpu(0), sample_rate=44100, mono=False)
print('Shape of audio samples: ', ar.shape())
# To access the audio samples
print('The first sample: ', ar[0])
print('The first five samples: ', ar[0:5])
print('Get a batch of samples: ', ar.get_batch([1,3,5]))

AVReader

AVReader is a wraper for both AudioReader and VideoReader. It enables you to slice the video and audio simultaneously.

from decord import AVReader
from decord import cpu, gpu

av = AVReader('example.mov', ctx=cpu(0))
# To access both the video frames and corresponding audio samples
audio, video = av[0:20]
# Each element in audio will be a batch of samples corresponding to a frame of video
print('Frame #: ', len(audio))
print('Shape of the audio samples of the first frame: ', audio[0].shape)
print('Shape of the first frame: ', video.asnumpy()[0].shape)
# Similarly, to get a batch
audio2, video2 = av.get_batch([1,3,5])

Bridges for deep learning frameworks:

It's important to have a bridge from decord to popular deep learning frameworks for training/inference

Apache MXNet (Done)
Pytorch (Done)
TensorFlow (Done)

Using bridges for deep learning frameworks are simple, for example, one can set the default tensor output to mxnet.ndarray:

import decord
vr = decord.VideoReader('examples/flipping_a_pancake.mkv')
print('native output:', type(vr[0]), vr[0].shape)
# native output: <class 'decord.ndarray.NDArray'>, (240, 426, 3)
# you only need to set the output type once
decord.bridge.set_bridge('mxnet')
print(type(vr[0], vr[0].shape))
# <class 'mxnet.ndarray.ndarray.NDArray'> (240, 426, 3)
# or pytorch and tensorflow(>=2.2.0)
decord.bridge.set_bridge('torch')
decord.bridge.set_bridge('tensorflow')
# or back to decord native format
decord.bridge.set_bridge('native')

decord's People

Contributors

Stargazers

Watchers

Forkers

bityangke innerlee felixzhang7 busixingxing liu-zhy leotac 19ai sjoerdapp liuguoyou seeker1943 huww98 joannalxy lyx190 shiyuanh hoonmokmoon neudeep karolmajek zhangjf2018 wentaozhu zhaoshengee pkusnail zhichenghuang alesanfra dowell666 ovuruska frankier yinweisu clotyxf martinhoang11 vichuda isabella232 wutachiang trendingtechnology cjack812 swave2015 alexlimofficial zgsxwsdxg mahrtynas wuhuachaocoding chenboheng boanz neabfi thetimmy hnn123 andreascag xiangjun0103 levan92 alexandergg bryant1410 amseej yqnt418 gadizimerman lovelyczli xuexuetong1993 laplacekorea metavai mandroide love112358 hirnimeshrampuresoftware zhou762 39239580 nizhogor matrix4284 20181210 rydenisbak shashi-banger misery0424 jenhaoyang hugy1989 c1zhang ai-ml-labs jstzwjr zhuipiaochen nicolasanjoran schallerala harsh188 gfieldgg jpmh1309 pinkdiamond1 garytann minostauros urbanist-ai gpm-robotics stephen-nju sueskind sihabsahariar amagimedia actis92 swmoon00 mesakh123 christianmasa georgia-tech-db chencuber loinh1106 johnnynunez ethicalsecurity-agency lishoulong fredericoperimlopes ghrua limgeuntaekk

decord's Issues

Random Access error towards the end of the video

Hi, thanks for a great library.

I just have some question that I encounter while reading Kinetics dataset using your video loader.

  File "/usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/video_reader.py", line 57, in get_batch
    arr = _CAPI_VideoReaderGetBatch(self._handle, indices)
  File "/usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/base.py", line 62, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [01:57:20] /home/shwancha/mmaction/third_party/decord/src/runtime/ndarray.cc:171: Check failed: from_size == to_size (1 vs. 279840) DECORDArrayCopyFromTo: The size must exactly match

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x9d) [0x7fd49bd1befa]
[bt] (1) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x2f) [0x7fd49bd1c223]
[bt] (2) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(decord::runtime::NDArray::CopyFromTo(DLTensor*, DLTensor*, void*)+0x107) [0x7fd49bd42763]
[bt] (3) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(decord::runtime::NDArray::CopyTo(decord::runtime::NDArray const&) const+0x118) [0x7fd49bd7743a]
[bt] (4) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::GetBatch(std::vector<long, std::allocator<long> >, decord::runtime::NDArray)+0x592) [0x7fd49bd76f9c]
[bt] (5) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(+0x158c1b) [0x7fd49bd64c1b]
[bt] (6) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(+0x15b24c) [0x7fd49bd6724c]
[bt] (7) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(std::function<void (decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*)>::operator()(decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*) const+0x5a) [0x7fd49bd20720]
[bt] (8) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(decord::runtime::PackedFunc::CallPacked(decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*) const+0x30) [0x7fd49bd1e842]
[bt] (9) /usr/local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/libdecord.so(DECORDFuncCall+0x95) [0x7fd49bd1952d]

VideoReader's seek, seek_accurate, skip_frames, and get_batch all works fine if it only tries to access the earlier frames. However, when it tries to access later frames (for video with 250 frames, accessing 238th frame) gives me above error message.

I find that simply looping the entire video through next() function has no problem.

Is there some issue with random access on the later frames?

Below is the output of ffprobe on the video (but this issue happens for all the other videos)

ffprobe version git-2019-06-06-78e1d7f Copyright (c) 2007-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 20160609
  configuration: --enable-shared --enable-gpl --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-librtmp --enable-libtheora --enable-libvorbis --enable-libx264 --enable-nonfree --enable-version3 --enable-libxvid --enable-libvpx
  libavutil      56. 28.100 / 56. 28.100
  libavcodec     58. 52.102 / 58. 52.102
  libavformat    58. 27.103 / 58. 27.103
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 55.100 /  7. 55.100
  libswscale      5.  4.101 /  5.  4.101
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/kinetics400/train/sled dog racing/Oj2zOWThCRs_000217_000227.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.27.103
  Duration: 00:00:10.00, start: 0.000000, bitrate: 586 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 340x256, 386 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

Thank you.

GPU memory leak

I am decoding a list of videos with:

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

frames = video.get_batch(frame_ids).asnumpy()

on every iteration, GPU Ram consumption goes up till I get out of memory error.

compile issue

I've successfully compiled and built according to the instruction and even install python binding.

However, I get following error when I try to impor the library:

>>> import decord
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/seunghwc/decord/python/decord/__init__.py", line 4, in <module>
    from ._ffi.runtime_ctypes import TypeCode
  File "/home/seunghwc/decord/python/decord/_ffi/runtime_ctypes.py", line 8, in <module>
    from .base import _LIB, check_call
  File "/home/seunghwc/decord/python/decord/_ffi/base.py", line 43, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/home/seunghwc/decord/python/decord/_ffi/base.py", line 35, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/home/seunghwc/anaconda3/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/seunghwc/decord/build/libdecord.so: undefined symbol: _ZN6decord4cuda12ProcessFrameEyyPhP11CUstream_stttii

Below is the result of running ldd /home/seunghwc/decord/build/libdecord.so

	linux-vdso.so.1 =>  (0x00007ffd1e13c000)
	libavformat.so.58 => /home/seunghwc/ffmpeg_build/lib/libavformat.so.58 (0x00007fc036d23000)
	libavfilter.so.7 => /home/seunghwc/ffmpeg_build/lib/libavfilter.so.7 (0x00007fc0367f4000)
	libavcodec.so.58 => /home/seunghwc/ffmpeg_build/lib/libavcodec.so.58 (0x00007fc035189000)
	libavutil.so.56 => /home/seunghwc/ffmpeg_build/lib/libavutil.so.56 (0x00007fc034d67000)
	libnvrtc.so.10.0 => /opt/cuda/10.0/lib64/libnvrtc.so.10.0 (0x00007fc03374a000)
	libcudart.so.10.0 => /opt/cuda/10.0/lib64/libcudart.so.10.0 (0x00007fc0334d0000)
	libcuda.so.1 => /lib64/libcuda.so.1 (0x00007fc0323c1000)
	libnvidia-ml.so.1 => /lib64/libnvidia-ml.so.1 (0x00007fc031db1000)
	libnvcuvid.so.1 => /lib64/libnvcuvid.so.1 (0x00007fc031901000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fc0315f9000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fc0312f6000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fc0310e0000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc030d1d000)
	/lib64/ld-linux-x86-64.so.2 (0x0000558a0cc90000)
	libz.so.1 => /lib64/libz.so.1 (0x00007fc030b06000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc0308ea000)
	libswscale.so.5 => /home/seunghwc/ffmpeg_build/lib/libswscale.so.5 (0x00007fc030660000)
	libpostproc.so.55 => /home/seunghwc/ffmpeg_build/lib/libpostproc.so.55 (0x00007fc03043f000)
	libswresample.so.3 => /home/seunghwc/ffmpeg_build/lib/libswresample.so.3 (0x00007fc030222000)
	libavresample.so.4 => /home/seunghwc/ffmpeg_build/lib/libavresample.so.4 (0x00007fc030004000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fc02fdfb000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fc02fbf7000)
	libnvidia-fatbinaryloader.so.410.78 => /lib64/libnvidia-fatbinaryloader.so.410.78 (0x00007fc02f9a9000)

I am not sure what went missing during compilation. Can you guide me to it?

P.S. I've installed using pip but would like to use GPU accelerator.

Extract audio signals

Is there a way to extract audio signals as well as visual frames?

If there's a function like below
"videos, audio_raw, meta = torchvision.io.read_video(video_path, pts_unit='sec')" in pytorch,
it would be more useful and powerful library.

I will appreciate it if you let us know your future plan for this.
Thanks in advance.

C++ examples

Are there plans to make the C++ API to decord public, similar to the python interface? A lot of the code appears to be implemented in C++ already but I see no examples of using decord in C++.

Random Frame Access Without Preloading the Entire File?

I'm wondering if the following feature is possible.

I'd like to be able to randomly access any frame without having to load the entire file into disk. Currently it seems like decord loads all of the video bytes into memory when you create an instace of decord.VideoReader, and then it lets you load the data.

I have a use case where I have thousands of video file, and I only need to extract a few specific frames from each video. Instead of loading the entire file into memory and then randomly decoding those specific frames (which it seems decord already does well), is it possible for decord to just open a handle to the file (without reading the entire thing), seek to the nearest iframe, and then use the subsequent b/p frames to construct the requested frame without necessarily having to read the entire thing?

Even in the case where the video doesn't have intermediate iframes, shouldn't it still be possible to extract the first frame of a 4GB video without loading all 4GB into memory?

Compiling using cmake

I'm trying to compile decord in windows using cmake but it cannot find ffmpeg.

I tried adding the path in decord/cmake/modulesFFmpeg.cmake

But it did not work.

Where do I need to add the path to ffmpeg.exe?

Where there is confusion

I used the installation package downloaded from the official website when installing ffmpeg, and did not use ppa command to install it. Is this feasible

Speedup

The speed of video reader is critical to be usable in training.
Currently, the video reader has a big overhead in initialization.
Testing reading 8 consecutive (with step 2) frames in random positions of 360 videos from Kinetics (resized to height 256), the average init time is 12.9ms, while the average decoding time is 7.2ms.

If we decrease the init time to 6ms, then there is 1.5x speed up.

Faster SkipFrames

SkipFrames is the key to random seek. Currently the following code took a long time (e.g. 30ms) when needing to skip long distance (>200 frames).

    while (num > 0) {
        PushNext();
        ret = decoder_->Pop(&frame);
        if (!ret) continue;
        --num;
    }

Question: Can it be faster (maybe in 2ms?)?

module 'decord' has no attribute 'video_reader'

Hello I have decord 0.3.3 installed but I believe something went wrong with the installation.

I'm trying to run a code snippet but I can't use one of decord's most valuable modules.
"""
import decord
video_name = 'boxing_test.mp4'
data = decord.VideoReader(video_name)
frames = data.get_batch(range(124,156))
print(frames.shape)
"""

When I run the code above I get this error;
module 'decord' has no attribute 'VideoReader'

I pip installed decord.

How can I solve this problem?

Cannot compile on windows 10

Hello, I have been trying to compile decord on windows 10 for the last couple of days.

I don't know much about cmake so I am struggling a bit.

The problem I'm having is that when it cannot find FFMPEG.

I changed the path to the libraries on FindFFmpeg.cmake , FFmpeg.cmake but it does not work.

I'm also passing in the command line the dir path to ffmpeg. -DFFMPEG_DIR:PATH="ffmpeg"

Is there any example on windows that someone could share with me?

Error when trying to use shuffle=3

Greetings!
Decord version is '0.4.0'

Here is my code:

vl = VideoLoader(videos, ctx=[cpu(0)], shape=(10, 1280, 720, 3), interval=1, skip=5, shuffle=3)

And here is my error:

DECORDErrorTraceback (most recent call last)
<timed exec> in <module>

/opt/conda/lib/python3.6/site-packages/decord/video_loader.py in __init__(self, uris, ctx, shape, interval, skip, shuffle, prefetch)
     53         assert len(shape) == 4, "expected shape: [bs, height, width, 3], given {}".format(shape)
     54         self._handle = _CAPI_VideoLoaderGetVideoLoader(
---> 55             uri, device_types, device_ids, shape[0], shape[1], shape[2], shape[3], interval, skip, shuffle, prefetch)
     56         assert self._handle is not None
     57         self._len = _CAPI_VideoLoaderLength(self._handle)

/opt/conda/lib/python3.6/site-packages/decord/_ffi/_ctypes/function.py in __call__(self, *args)
    173         check_call(_LIB.DECORDFuncCall(
    174             self.handle, values, tcodes, ctypes.c_int(num_args),
--> 175             ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
    176         _ = temp_args
    177         _ = args

/opt/conda/lib/python3.6/site-packages/decord/_ffi/base.py in check_call(ret)
     61     """
     62     if ret != 0:
---> 63         raise DECORDError(py_str(_LIB.DECORDGetLastError()))
     64 
     65 

DECORDError: [08:51:44] /io/decord/src/video/video_loader.cc:70: Invalid shuffle mode: 3 Available: 
	{No shuffle: 0}
	{Random File Order: 1}
	{Random access: 2}

Stack trace returned 10 entries:
[bt] (0) /opt/conda/lib/python3.6/site-packages/decord/libdecord.so(dmlc::StackTrace(unsigned long)+0x50) [0x7f3ea0501830]
[bt] (1) /opt/conda/lib/python3.6/site-packages/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1d) [0x7f3ea050291d]
[bt] (2) /opt/conda/lib/python3.6/site-packages/decord/libdecord.so(decord::VideoLoader::VideoLoader(std::vector<std::string, std::allocator<std::string> >, std::vector<DLContext, std::allocator<DLContext> >, std::vector<int, std::allocator<int> >, int, int, int, int)+0xe54) [0x7f3ea054c4c4]
[bt] (3) /opt/conda/lib/python3.6/site-packages/decord/libdecord.so(+0x6d5be) [0x7f3ea05455be]
[bt] (4) /opt/conda/lib/python3.6/site-packages/decord/libdecord.so(DECORDFuncCall+0x52) [0x7f3ea04fe412]
[bt] (5) /opt/conda/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f3ebcce7ec0]
[bt] (6) /opt/conda/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f3ebcce787d]
[bt] (7) /opt/conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f3ebcefdede]
[bt] (8) /opt/conda/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13915) [0x7f3ebcefe915]
[bt] (9) /opt/conda/bin/python(_PyObject_FastCallDict+0x8b) [0x561e814a1e3b]

How can I use shuffle=3 on video batches? Do I understand right, that shuffle=3 will give me BATCH_SIZe number of random frames from the each video? If it's right, why do we need interval and skip in this mode?

Thanks in advance!

GPU decode error

I find that in the gpu mode, some videos cannot be decoded correctly while in the cpu mode everything works fine.

Take v_ApplyEyeMakeup_g11_c02.avi.zip from ucf101 dataset for example:

import decord as de
vr = de.VideoReader('./v_ApplyEyeMakeup_g11_c02.avi', ctx=de.gpu(0))

The error pops out like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wizyoung/.local/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/video_reader.py", line 37, in __init__
    uri, ctx.device_type, ctx.device_id, width, height)
  File "/home/wizyoung/.local/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/wizyoung/.local/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/_ffi/base.py", line 63, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [02:40:48] /home/wizyoung/my_env/decord/src/video/nvcodec/cuda_threaded_decoder.cc:83: Check failed: bsf Error finding bitstream filter:

Env:
ffmpeg 4.1.5, cuda 10.1.243, NV Tesla V100 on Ubuntu 18.04

BTW: I find decord in cpu mode sometimes runs slower than opencv videocapture on many videos. (tested on ucf101 dataset)

decord._ffi.base.DECORDError from VideoReader

Hi, there seems to be some problem with videos containing corrupted frames.

Video, script, and logs available at Google drive.

Video is from the MEVA dataset. It is loadable by cv2.VideoCapture, pymovie.editor.VideoFileClip, and avi_r.AVIReader but contains several corrupted frames as detailed here.

Environment: Mac OS 10.15.5, python 3.7, decord 0.4.0; also reproducible on Linux.

# Name                    Version                   Build  Channel
ca-certificates           2020.6.24                     0  
certifi                   2020.6.20                py37_0  
decord                    0.4.0                    pypi_0    pypi
libcxx                    10.0.0                        1  
libedit                   3.1.20191231         h1de35cc_1  
libffi                    3.3                  hb1e8313_2  
ncurses                   6.2                  h0a44026_1  
numpy                     1.19.0                   pypi_0    pypi
openssl                   1.1.1g               h1de35cc_0  
pip                       20.1.1                   py37_1  
python                    3.7.7                hf48f09d_4  
readline                  8.0                  h1de35cc_0  
setuptools                49.2.0                   py37_0  
sqlite                    3.32.3               hffcf06c_0  
tk                        8.6.10               hb0a8c7a_0  
wheel                     0.34.2                   py37_0  
xz                        5.2.5                h1de35cc_0  
zlib                      1.2.11               h1de35cc_3

Script:

from decord import VideoReader
from decord import cpu

vr = VideoReader('2018-03-07.16-55-06.17-00-06.school.G336.avi', ctx=cpu(0))
for i in range(len(vr)):       
    frame = vr[i]
    print(i, frame.shape)

Log:

[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
Invalid UE golomb code
[NULL @ 0x7fde3829b200] pps_id 3199971767 out of range
[NULL @ 0x7fde3829b200] sps_id 3 out of range
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] sps_id 3 out of range
[NULL @ 0x7fde3829b200] SEI type 0 size 568 truncated at 256
[NULL @ 0x7fde3829b200] non-existing PPS 176 referenced
[NULL @ 0x7fde3829b200] SEI type 0 size 568 truncated at 256
[NULL @ 0x7fde3829b200] non-existing PPS 176 referenced
Invalid UE golomb code
[NULL @ 0x7fde3829b200] pps_id 3199971767 out of range
[NULL @ 0x7fde3829b200] sps_id 32 out of range
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
Invalid UE golomb code
[NULL @ 0x7fde3829b200] pps_id 3199971767 out of range
[NULL @ 0x7fde3829b200] sps_id 3 out of range
[NULL @ 0x7fde3829b200] missing picture in access unit with size 56
[NULL @ 0x7fde3829b200] sps_id 3 out of range
[NULL @ 0x7fde3829b200] SEI type 0 size 568 truncated at 256
[NULL @ 0x7fde3829b200] non-existing PPS 176 referenced
[NULL @ 0x7fde3829b200] SEI type 0 size 568 truncated at 256
[NULL @ 0x7fde3829b200] non-existing PPS 176 referenced
Invalid UE golomb code
[NULL @ 0x7fde3829b200] pps_id 3199971767 out of range
[NULL @ 0x7fde3829b200] sps_id 32 out of range
[h264 @ 0x7fde1800b400] No start code is found.
[h264 @ 0x7fde1800b400] Error splitting the input into NAL units.
[h264 @ 0x7fde18048400] No start code is found.
[h264 @ 0x7fde18048400] Error splitting the input into NAL units.
[h264 @ 0x7fde18048a00] No start code is found.
[h264 @ 0x7fde18048a00] Error splitting the input into NAL units.
[h264 @ 0x7fde1808c200] No start code is found.
[h264 @ 0x7fde1808c200] Error splitting the input into NAL units.
[h264 @ 0x7fde1808c800] No start code is found.
[h264 @ 0x7fde1808c800] Error splitting the input into NAL units.
[h264 @ 0x7fde18016600] No start code is found.
[h264 @ 0x7fde18016600] Error splitting the input into NAL units.
0 (1080, 1920, 3)
... (Omitted)
103 (1080, 1920, 3)
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    frame = vr[i]
  File "/Users/lijun/Applications/miniconda3/envs/decord/lib/python3.7/site-packages/decord/video_reader.py", line 92, in __getitem__
    return self.next()
  File "/Users/lijun/Applications/miniconda3/envs/decord/lib/python3.7/site-packages/decord/video_reader.py", line 104, in next
    arr = _CAPI_VideoReaderNextFrame(self._handle)
  File "/Users/lijun/Applications/miniconda3/envs/decord/lib/python3.7/site-packages/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/Users/lijun/Applications/miniconda3/envs/decord/lib/python3.7/site-packages/decord/_ffi/base.py", line 63, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [16:53:10] /Users/travis/build/zhreshold/decord-distro/decord/src/video/ffmpeg/threaded_decoder.cc:288: [16:53:10] /Users/travis/build/zhreshold/decord-distro/decord/src/video/ffmpeg/threaded_decoder.cc:216: Check failed: avcodec_send_packet(dec_ctx_.get(), pkt.get()) >= 0 (-1094995529 vs. 0) Thread worker: Error sending packet.

Stack trace returned 7 entries:
[bt] (0) 0   libdecord.dylib                     0x0000000112db2d70 dmlc::StackTrace(unsigned long) + 464
[bt] (1) 1   libdecord.dylib                     0x0000000112db2a54 dmlc::LogMessageFatal::~LogMessageFatal() + 52
[bt] (2) 2   libdecord.dylib                     0x0000000112df2b16 decord::ffmpeg::FFMPEGThreadedDecoder::WorkerThreadImpl() + 390
[bt] (3) 3   libdecord.dylib                     0x0000000112df0d29 decord::ffmpeg::FFMPEGThreadedDecoder::WorkerThread() + 25
[bt] (4) 4   libdecord.dylib                     0x0000000112df41ee void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (decord::ffmpeg::FFMPEGThreadedDecoder::*)(), decord::ffmpeg::FFMPEGThreadedDecoder*> >(void*) + 62
[bt] (5) 5   libsystem_pthread.dylib             0x00007fff72534109 _pthread_start + 148
[bt] (6) 6   libsystem_pthread.dylib             0x00007fff7252fb8b thread_start + 15



Stack trace returned 10 entries:
[bt] (0) 0   libdecord.dylib                     0x0000000112db2d70 dmlc::StackTrace(unsigned long) + 464
[bt] (1) 1   libdecord.dylib                     0x0000000112db2a54 dmlc::LogMessageFatal::~LogMessageFatal() + 52
[bt] (2) 2   libdecord.dylib                     0x0000000112df0cae decord::ffmpeg::FFMPEGThreadedDecoder::CheckErrorStatus() + 142
[bt] (3) 3   libdecord.dylib                     0x0000000112df171a decord::ffmpeg::FFMPEGThreadedDecoder::Pop(decord::runtime::NDArray*) + 58
[bt] (4) 4   libdecord.dylib                     0x0000000112de812c decord::VideoReader::NextFrameImpl() + 108
[bt] (5) 5   libdecord.dylib                     0x0000000112de8318 decord::VideoReader::NextFrame() + 24
[bt] (6) 6   libdecord.dylib                     0x0000000112ddcb9d std::__1::__function::__func<decord::runtime::$_1, std::__1::allocator<decord::runtime::$_1>, void (decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*)>::operator()(decord::runtime::DECORDArgs&&, decord::runtime::DECORDRetValue*&&) + 77
[bt] (7) 7   libdecord.dylib                     0x0000000112daf5b6 DECORDFuncCall + 70
[bt] (8) 8   libffi.7.dylib                      0x000000010226bead ffi_call_unix64 + 85
[bt] (9) 9   ???                                 0x00007ffeede011b0 0x0 + 140732889305520

Compile failed when using cuda

when i set -DUSE_CUDA=ON , some error occur:

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 10.1.105
-- Check for working CUDA compiler: /home/tongzhan/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /home/tongzhan/cuda-10.1/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
FFMPEG_INCLUDE_DIR = /home/tongzhan/anaconda3/envs/detectron2/include
FFMPEG_LIBRARIES = /home/tongzhan/anaconda3/envs/detectron2/lib/libavformat.so;/home/tongzhan/anaconda3/envs/detectron2/lib/libavfilter.so;/home/tongzhan/anaconda3/envs/detectron2/lib/libavcodec.so;/home/tongzhan/anaconda3/envs/detectron2/lib/libavutil.so
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/home/tongzhan/cuda-10.1
-- Found CUDA_CUDA_LIBRARY=/home/tongzhan/cuda-10.1/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/home/tongzhan/cuda-10.1/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/home/tongzhan/cuda-10.1/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/home/tongzhan/cuda-10.1/lib64/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/home/tongzhan/anaconda3/envs/detectron2/lib/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/home/tongzhan/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=/usr/lib/x86_64-linux-gnu/libnvcuvid.so
-- Build with CUDA support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tongzhan/decord/build

All above might be successful, however when I make:

[ 85%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_texture.cc.o
[ 88%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_threaded_decoder.cc.o
[ 91%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_device_api.cc.o
[ 94%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_module.cc.o
[ 97%] Building CUDA object CMakeFiles/decord.dir/src/improc/improc.cu.o
/home/tongzhan/decord/src/improc/improc.cu(107): error: explicit type is missing ("int" assumed)

/home/tongzhan/decord/src/improc/improc.cu(108): error: explicit type is missing ("int" assumed)

/home/tongzhan/decord/src/improc/improc.cu(84): error: explicit type is missing ("int" assumed)

/home/tongzhan/decord/src/improc/improc.cu(86): error: explicit type is missing ("int" assumed)

/home/tongzhan/decord/src/improc/improc.cu(90): error: explicit type is missing ("int" assumed)

/home/tongzhan/decord/src/improc/improc.cu(90): error: no suitable conversion function from "float2" to "int" exists

/home/tongzhan/decord/src/improc/improc.cu(91): error: expression must have class type

/home/tongzhan/decord/src/improc/improc.cu(92): error: expression must have class type

/home/tongzhan/decord/src/improc/improc.cu(48): error: explicit type is missing ("int" assumed)
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(49): error: explicit type is missing ("int" assumed)
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(50): error: explicit type is missing ("int" assumed)
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(51): error: explicit type is missing ("int" assumed)
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(53): error: explicit type is missing ("int" assumed)
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(53): error: a reference of type "int &" (not const-qualified) cannot be initialized with a value of type "float [9]"
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(58): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(58): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(58): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(59): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(59): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(59): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(60): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(60): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(60): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(62): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(62): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(62): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(63): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(63): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(63): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(64): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(64): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

/home/tongzhan/decord/src/improc/improc.cu(64): error: expression must have pointer-to-object type
          detected during instantiation of "void decord::cuda::detail::process_frame_kernel(cudaTextureObject_t, cudaTextureObject_t, T *, uint16_t, uint16_t, uint16_t, uint16_t, float, float) [with T=uint8_t]"
(113): here

32 errors detected in the compilation of "/tmp/tmpxft_0000a248_00000000-6_improc.cpp1.ii".
make[2]: *** [CMakeFiles/decord.dir/build.make:509: CMakeFiles/decord.dir/src/improc/improc.cu.o] Error 1
make[2]: Leaving directory '/home/tongzhan/decord/build'
make[1]: *** [CMakeFiles/Makefile2:93: CMakeFiles/decord.dir/all] Error 2
make[1]: Leaving directory '/home/tongzhan/decord/build'
make: *** [Makefile:147: all] Error 2

question about random shuffle in VideoLoader

Is it a choice for random frame in batch size ?
The shuffle=2 only shuffle between batch size. (For example, batch=4, the index of one batch is contiguous---like 22, 23, 24, 25). But I want to get something like 22, 28, 12, 40 (random).

Thank you!

cmake invalid new-expression of abstract class type

Hi, when installing from the source (I want to use with cuda) and running cmake there is the following error

Scanning dependencies of target decord
[  2%] Building CXX object CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o
[  5%] Building CXX object CMakeFiles/decord.dir/src/runtime/cpu_device_api.cc.o
[  8%] Building CXX object CMakeFiles/decord.dir/src/runtime/dso_module.cc.o
[ 11%] Building CXX object CMakeFiles/decord.dir/src/runtime/file_util.cc.o
[ 14%] Building CXX object CMakeFiles/decord.dir/src/runtime/module.cc.o
[ 17%] Building CXX object CMakeFiles/decord.dir/src/runtime/module_util.cc.o
[ 20%] Building CXX object CMakeFiles/decord.dir/src/runtime/ndarray.cc.o
[ 22%] Building CXX object CMakeFiles/decord.dir/src/runtime/registry.cc.o
[ 25%] Building CXX object CMakeFiles/decord.dir/src/runtime/str_util.cc.o
[ 28%] Building CXX object CMakeFiles/decord.dir/src/runtime/system_lib_module.cc.o
[ 31%] Building CXX object CMakeFiles/decord.dir/src/runtime/thread_pool.cc.o
[ 34%] Building CXX object CMakeFiles/decord.dir/src/runtime/threading_backend.cc.o
[ 37%] Building CXX object CMakeFiles/decord.dir/src/runtime/workspace_pool.cc.o
[ 40%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_file_order_sampler.cc.o
[ 42%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_sampler.cc.o
[ 45%] Building CXX object CMakeFiles/decord.dir/src/sampler/sequential_sampler.cc.o
[ 48%] Building CXX object CMakeFiles/decord.dir/src/sampler/smart_random_sampler.cc.o
[ 51%] Building CXX object CMakeFiles/decord.dir/src/video/logging.cc.o
[ 54%] Building CXX object CMakeFiles/decord.dir/src/video/storage_pool.cc.o
[ 57%] Building CXX object CMakeFiles/decord.dir/src/video/video_interface.cc.o
[ 60%] Building CXX object CMakeFiles/decord.dir/src/video/video_loader.cc.o
[ 62%] Building CXX object CMakeFiles/decord.dir/src/video/video_reader.cc.o
/home/chris/data/untitled/python/data/pytorch/decord/src/video/video_reader.cc: In member function ‘virtual void decord::VideoReader::SetVideoStream(int)’:
/home/chris/data/untitled/python/data/pytorch/decord/src/video/video_reader.cc:101:62: error: invalid new-expression of abstract class type ‘decord::cuda::CUThreadedDecoder’
  101 |             ctx_.device_id, codecpar.get(), fmt_ctx_->iformat));
      |                                                              ^
In file included from /home/chris/data/untitled/python/data/pytorch/decord/src/video/video_reader.cc:10:
/home/chris/data/untitled/python/data/pytorch/decord/src/video/nvcodec/cuda_threaded_decoder.h:29:7: note:   because the following virtual functions are pure within ‘decord::cuda::CUThreadedDecoder’:
   29 | class CUThreadedDecoder final : public ThreadedDecoderInterface {
      |       ^~~~~~~~~~~~~~~~~
In file included from /home/chris/data/untitled/python/data/pytorch/decord/src/video/video_reader.h:10,
                 from /home/chris/data/untitled/python/data/pytorch/decord/src/video/video_reader.cc:7:
/home/chris/data/untitled/python/data/pytorch/decord/src/video/threaded_decoder_interface.h:21:22: note: 	‘virtual void decord::ThreadedDecoderInterface::SetCodecContext(AVCodecContext*, int, int, int)’
   21 |         virtual void SetCodecContext(AVCodecContext *dec_ctx, int width = -1, int height = -1, int rotation = 0) = 0;
      |                      ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/decord.dir/build.make:356: CMakeFiles/decord.dir/src/video/video_reader.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:96: CMakeFiles/decord.dir/all] Error 2
make: *** [Makefile:150: all] Error 2

My machine is manjaro with cuda 10.2 and cmake version 3.17.2.

Any advice?

decord compiled with ffmpeg 4 failed to load videos processed by ffmpeg 2

Hi zhreshold,
I compiled the latest decord(0.3.1) with ffmpeg 4.2(libavcodec 58.* ), and when I read video (processed with ffmpeg2, libavcodec 56.* ) with the compiled decord, it throws out the error:

avcodec_send_packet(def ctx.get(), pkt.get()) >= 0 (-11 vs. 0) thread worker: error sending packet

Would you please check if it also happends on your system?

CUDAError: cuvidCreateVideoParser

How to debug this error?

In [1]: from decord import VideoReader                                                                               

In [2]: from decord import cpu, gpu                                                                                  

In [3]: vr = VideoReader('examples/flipping_a_pancake.mkv', ctx=gpu(0))                                              
[21:57:08] /mnt/hdd/dev/decord/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: GeForce GTX 1080 Ti
[21:57:08] /mnt/hdd/dev/decord/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 430.5, so using our own stream.
---------------------------------------------------------------------------
DECORDError                               Traceback (most recent call last)
<ipython-input-3-b86fa8662b86> in <module>
----> 1 vr = VideoReader('examples/flipping_a_pancake.mkv', ctx=gpu(0))

/mnt/hdd/dev/decord/python/decord/video_reader.py in __init__(self, uri, ctx, width, height)
     35         self._handle = None
     36         self._handle = _CAPI_VideoReaderGetVideoReader(
---> 37             uri, ctx.device_type, ctx.device_id, width, height)
     38         if self._handle is None:
     39             raise RuntimeError("Error reading " + uri + "...")

/mnt/hdd/dev/decord/python/decord/_ffi/_ctypes/function.py in __call__(self, *args)
    173         check_call(_LIB.DECORDFuncCall(
    174             self.handle, values, tcodes, ctypes.c_int(num_args),
--> 175             ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
    176         _ = temp_args
    177         _ = args

/mnt/hdd/dev/decord/python/decord/_ffi/base.py in check_call(ret)
     61     """
     62     if ret != 0:
---> 63         raise DECORDError(py_str(_LIB.DECORDGetLastError()))
     64 
     65 

DECORDError: [21:57:08] /mnt/hdd/dev/decord/src/video/nvcodec/cuda_parser.h:39: CUDAError: cuvidCreateVideoParser(&parser_, &parser_info_) failed with error:

Feature request: GPU decoder for vp8 and vp9

I notice that you comment out these lines:

    } aCodecName [] = {
        { cudaVideoCodec_MPEG1,    "MPEG-1"       },
        { cudaVideoCodec_MPEG2,    "MPEG-2"       },
        { cudaVideoCodec_MPEG4,    "MPEG-4 (ASP)" },
        { cudaVideoCodec_VC1,      "VC-1/WMV"     },
        { cudaVideoCodec_H264,     "AVC/H.264"    },
        { cudaVideoCodec_JPEG,     "M-JPEG"       },
        { cudaVideoCodec_H264_SVC, "H.264/SVC"    },
        { cudaVideoCodec_H264_MVC, "H.264/MVC"    },
        { cudaVideoCodec_HEVC,     "H.265/HEVC"   },
        //{ cudaVideoCodec_VP8,      "VP8"          }, // ?
        //{ cudaVideoCodec_VP9,      "VP9"          },
        { cudaVideoCodec_NumCodecs,"Invalid"      },
        { cudaVideoCodec_YUV420,   "YUV  4:2:0"   },
        { cudaVideoCodec_YV12,     "YV12 4:2:0"   },
        { cudaVideoCodec_NV12,     "NV12 4:2:0"   },
        { cudaVideoCodec_YUYV,     "YUYV 4:2:2"   },
        { cudaVideoCodec_UYVY,     "UYVY 4:2:2"   },
    };

However, VP8 and VP9 are already supported by nvcodec. Could you please implement the GPU decoder for them? VP9 decoding is faster than H264 and the output video file is smaller.

nvcc fatal : redefinition of argument 'std'

in the last stage of compiling using make, it report a bug:

[ 97%] Building CUDA object CMakeFiles/decord.dir/src/improc/improc.cu.o
nvcc fatal : redefinition of argument 'std'
CMakeFiles/decord.dir/build.make:508: recipe for target 'CMakeFiles/decord.dir/src/improc/improc.cu.o' failed

I find that it is caused by line 62, CMakeLists.txt

set(CMAKE_CUDA_FLAGS "-std=c++11 ${CMAKE_CUDA_FLAGS}")

erase the '-std=c++11' will solve the problem.

Exception handling with corrupted videos

Hi,

I'm finding some issues using decord when a file content is corrupted.
This is the stack trace:

[h264 @ 0x558362b3b600] Invalid NAL unit 0, skipping.
[h264 @ 0x558362b3b600] cabac decode of qscale diff failed at 32 20
[h264 @ 0x558362b3b600] error while decoding MB 32 20, bytestream 2173
[h264 @ 0x558362b3b600] Frame num change from 7 to 11
[h264 @ 0x558362b3b600] decode_slice_header error
[h264 @ 0x558362b3b600] concealing 2017 DC, 2017 AC, 2017 MV errors in P frame
[h264 @ 0x558362b3b600] Invalid NAL unit size (1731009324 > 12203).
[h264 @ 0x558362b3b600] Error splitting the input into NAL units.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [15:03:42] /io/decord/src/video/ffmpeg/threaded_decoder.cc:178: Check failed: avcodec_send_packet(dec_ctx_.get(), pkt.get()) >= 0 (-1094995529 vs. 0) Thread worker: Error sending packet.

Stack trace returned 10 entries:
[bt] (0) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(dmlc::StackTrace(unsigned long)+0x7f) [0x7fb6f6ad4ef7]
[bt] (1) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x20) [0x7fb6f6ad51b2]
[bt] (2) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(decord::ffmpeg::FFMPEGThreadedDecoder::WorkerThread()+0x497) [0x7fb6f6b358ab]
[bt] (3) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(void std::_Mem_fn<void (decord::ffmpeg::FFMPEGThreadedDecoder::*)()>::operator()<, void>(decord::ffmpeg::FFMPEGThreadedDecoder*) const+0x65) [0x7fb6f6b3dd51]
[bt] (4) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(void std::_Bind_simple<std::_Mem_fn<void (decord::ffmpeg::FFMPEGThreadedDecoder::*)()> (decord::ffmpeg::FFMPEGThreadedDecoder*)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x43) [0x7fb6f6b3dc83]
[bt] (5) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(std::_Bind_simple<std::_Mem_fn<void (decord::ffmpeg::FFMPEGThreadedDecoder::*)()> (decord::ffmpeg::FFMPEGThreadedDecoder*)>::operator()()+0x1b) [0x7fb6f6b3db5d]
[bt] (6) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (decord::ffmpeg::FFMPEGThreadedDecoder::*)()> (decord::ffmpeg::FFMPEGThreadedDecoder*)> >::_M_run()+0x1c) [0x7fb6f6b3da9e]
[bt] (7) /home/leo/miniconda3/envs/decord/lib/python3.7/site-packages/decord/libdecord.so(+0x16cce0) [0x7fb6f6b3ece0]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fb705dfa6db]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fb705b2388f]


[1]    21258 abort (core dumped)  python test.py

I'm running this simple script:

from decord import VideoReader
vr = VideoReader("corruptedvideo.mp4")
vr.get_batch(list(range(70)))

and the video is corrupted around second 2 (@ 30fps).
The failure of course is expected, but the process is killed with an abort.
opencv is able to handle the issue, failing (more or less) gracefully.

Which cpu is used when using device=cpu(0)

Hi, thanks for the awesome Decord, when I wanna use GPU to read videos, I can specify the specific gpu(0)-gpu(7), but there are too many CPUs in the server, does that mean when I set de default cpu(0), only one CPU will be used for reading?

undefined symbol: avcodec_parameters_copy

hi @zhreshold
I successfully finished the installation process according to your tutorial on ubuntu,but the error arised as follows when i import decord in python3.

OSError: /home/lw/.local/lib/python3.5/site-packages/decord-0.0.1-py3.5.egg/decord/libdecord.so: undefined symbol: avcodec_parameters_copy

I have no idea how to solve it,so I have to ask for your help!

Install from source to colab, can't find CMakeLists

In order to run decord with powerful compute,I chose colab run it.
And I input command one by one according the order.
When I run this command

!cmake .. -DUSE_CUDA=0
!make

There throw a error as below

CMake Error: The source directory " / " does not appear to contain CMakeLists.txt.

Any advice?

gpu-model compile failure

when I complie from the source, I use the latest master branch. When make, it shows the error
[ 2%] Building CXX object CMakeFiles/decord.dir/src/video/video_reader.cc.o
/.../decord/src/video/video_reader.cc: In member function ‘virtual void decord::VideoReader::SetVideoStream(int)’:
/.../decord/src/video/video_reader.cc:101:62: error: invalid new-expression of abstract class type ‘decord::cuda::CUThreadedDecoder’
ctx_.device_id, codecpar.get(), fmt_ctx_->iformat));

may it be the code's problem?

CUDA not enabled. Requested context GPU(0).

Compile failed when using cuda

hi~ the first step cmake .. -DUSE_CUDA=ON is ok, but when i run make,the error occurs:

is there any way to solve this ,thank u:)

Issues on videos with imprecise metadata

I noticed a failing unit test working on a PR. It was asserting a wrong expected result (incorrectly returned by older versions of decord), that it was fixed (by accident?) by a later version.

def test_video_reader_len():
    vr = _get_default_test_video() # "flipping_a_pancake.mkv"
    assert len(vr) == 311 #number of frames is actually 310, and current version gives the correct result

are these tests automatically run anywhere, and should they? :-)
the failing test is a symptom of something bad going on, so I took a deeper look..
The behavior of GetFrameCount was changed (and "fixed", as a side effect) when GetFramePTS was introduced in 0.3.6.
With decord 0.3.5, on ""flipping_a_pancake.mkv", the function would fall back to an approximation based on FPS*duration (from metadata). In that video, the duration metadata seems to be imprecise (I think it's actually the duration of the audio track rather than the video stream), so the estimation is off by one, and this creates issues everywhere.
For example: , a VideoReader of that video returns 311 frames (while the video actually only has 310), where the last "extra" frame actually contains another "random" frame of the video.
From 0.3.6, it seems that GetFrameCount will (always?) return frame_ts_.size(), which is more precise and fixes the issue.
Should perhaps the approximation of GetFrameCount be completely removed by GetFrameCount, making sure frame_ts_.size() is always used, to avoid similar issues?
That is not the only issue that occurred on that video. With bad or missing duration metadata, a lot of other things are still going wrong, mainly because that metadata is used in Seek.
With the latest version of decord, on "flipping_a_pancake.mkv", the video stream has bad duration in the metadata (same also for the other example video, both from Kinetics I guess)... leading to very bad things with anything that needs to Seek or SeekAccurate (anything that is not a sequential read of the entire video..).
For example, this is an interesting failure (still occurring with the latest version):

from decord import VideoReader
vr = VideoReader("flipping_a_pancake.mkv")
frames = vr[:].asnumpy()              # no Seek is involved, except Seek(0); so, correct result.
b = vr[152].asnumpy()                 # Seek(150) "fails", actually seeking to ts = 0
(b == frames[152]).all()              # returns False
(b == frames[2]).all()                # returns True

where the nearest keyframe is 150, but Seek(150) goes back to 0 due to a bad duration in the metadata (negative..). Is there a way to be robust to this kind of issue, maybe relying on the pts in frame_ts, rather then crucially relying on the "duration" metadata in FrameToPTS and other functions? But videos are messy, and the pts could have also their own issues, so I don't know if that would solve everything.

Resource temporarily unavailable when calling VideoReader or VideoLoader multiple times

Hi Joshua,

I tried to call VideoReader() or VideoLoader() various times but it crashed in the middle. To reproduce the error, I provide the sample code below:

import decord
for i in range(250):
    print(i)
    #a = decord.VideoLoader(['/mnt/SSD/Kinetics_trimmed_videos_train_merge/7E60em35UUw.mp4'], shape=(8,340,256,3), interval=8, skip=0,shuffle=3)
    a = decord.VideoReader('/mnt/SSD/Kinetics_trimmed_videos_train_merge/7E60em35UUw.mp4')
    # a.reset()

The program shuts down when i is 230 or 231 with an error message saying:

Traceback (most recent call last):
  File "test_code.py", line 6, in <module>
    a = decord.VideoReader('/mnt/SSD/Kinetics_trimmed_videos_train_merge/7E60em35UUw.mp4')
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/video_reader.py", line 21, in __init__
    uri, ctx.device_type, ctx.device_id, width, height)
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/base.py", line 62, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: Resource temporarily unavailable

For VideoLoader, the error message is similar:

Traceback (most recent call last):
  File "test_code.py", line 4, in <module>
    a = decord.VideoLoader(['/mnt/SSD/Kinetics_trimmed_videos_train_merge/7E60em35UUw.mp4'], shape=(8,340,256,3), interval=8, skip=0,shuffle=3)
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/video_loader.py", line 24, in __init__
    uri, shape[0], shape[1], shape[2], shape[3], interval, skip, shuffle, prefetch)
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/yzhao/.local/lib/python3.6/site-packages/decord-0.0.1-py3.6.egg/decord/_ffi/base.py", line 62, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: Resource temporarily unavailable

The environment that I use is also attached for reference

Ubuntu 18.04
ffmpeg 3.4.6-0ubuntu0.18.04.1

Will appreciate it if you could look into this. Thanks!

Return timestamps

When reading video frames, it is nice to also return the actual timestamps of the read frames. This will be a great feature for applications like VOS and tracking.

How to prevent annoying ffmpeg log

I use ffmpeg 4.0 and install decord(0.3.5) on my linux machine with pip install. When I load videos with decord, the annoying log '[swscaler @ 0x7fc932c27040] deprecated pixel format used, make sure you did set range correctly' keeps appearing. Is it possible to prevent such logs from appearing(videos are loaded properly)?

Determine video rotation

I'm currently working with the BDD dataset: https://deepdrive.berkeley.edu/

The videos in that dataset have different rotations applied to them. Using ffprobe -i <video-path> I see different rotation metadata.

Does decord have a way to capture this metadata and return a correctly rotated frame? If not would it be difficult to do this?

[Feature Request] get_frame_timestamp in integral units (e.g. flicks)

I'm only getting my feet wet with video encoding schemes, so forgive (and please correct) me if I say anything uniformed.

Seconds in floating point format, while intuitive, can lead to numerical issues. It would be nice there was an integral representation of time that was not subject to floating point errors. A unit that seems nice for this is the "flick". A flick is a unit of time equal to exactly 1/705,600,000, which was chosen to be divisible by the most common sampling frequencies.

The code seems to use this concept of PTS to implement the get_frame_timestamp function. Looking up PTS I found out that it was a presentation time stamp, and it seems to be encoded as an integer in the backend, but I'm not sure what unit it is using. ~~Line 444 seems to simply statically cast this int64 directly into a float32, but I don't see any unit conversion~~. Totally missed that the int64 was the shape and the float32 was the value; I get confused dealing with strongly typed languages sometimes (which is odd considering they are even more explicit about what's happening).

What exactly is the PTS encoding here? And is it possible / useful to add some API that returns some integral representation of time?

compiling errors

CMakeFiles/decord.dir/build.make:62: recipe for target 'CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o' failed
make[2]: *** [CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/decord.dir/all' failed
make[1]: *** [CMakeFiles/decord.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

[feature requests] read video from url

opencv support read video in rtps or some other form to read video online.
Does decord support it ?

[Feature Requests] Support read file from file-like object.

Let VideoReader support read from file-like object, i.e. BytesIO, will make a lot benefits:

It can be compatible with many third-party filesystem interfaces, such as GFile and PyFilesystem
It can be very convenient to access cloud-storage, such as S3 on AWS and OSS on AlibabaCloud

Currently I have to download my videos from remote into temporary files before initializing VideoReader like this:

tmpfile = tempfile.NamedTemporaryFile('wb', delete=False)
fp = bucket.get_object('xxx/xxx.mp4')
tmpfile.write(fp.read())
container = decord.VideoReader(tmpfile.name, ctx=decord.cpu(0)
# DO DECODE

But in PyAV, I can access and decode my video just like this:

fp = bucket.get_object('xxx/xxx.mp4')
container = av.open(BytesIO(fp.read()))
# DO DECODE

make error

cwq@CWQ:/PycharmProjects/decord/build$ cmake .. -DUSE_CUDA=ON
-- Found FFMPEG or Libav: /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so, /usr/include/x86_64-linux-gnu
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
FFMPEG_INCLUDE_DIR = /usr/include/x86_64-linux-gnu
FFMPEG_LIBRARIES = /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/local/cuda/lib64/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/local/cuda/lib64/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/usr/local/cuda/lib64/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=/usr/lib/x86_64-linux-gnu/libnvcuvid.so
-- Build with CUDA support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/cwq/PycharmProjects/decord/build
cwq@CWQ:/PycharmProjects/decord/build$ make -j8
Scanning dependencies of target decord
[ 2%] Building CXX object CMakeFiles/decord.dir/src/runtime/cpu_device_api.cc.o
[ 5%] Building CXX object CMakeFiles/decord.dir/src/runtime/module_util.cc.o
[ 8%] Building CXX object CMakeFiles/decord.dir/src/runtime/ndarray.cc.o
[ 11%] Building CXX object CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o
[ 13%] Building CXX object CMakeFiles/decord.dir/src/runtime/registry.cc.o
[ 16%] Building CXX object CMakeFiles/decord.dir/src/runtime/file_util.cc.o
[ 19%] Building CXX object CMakeFiles/decord.dir/src/runtime/module.cc.o
[ 22%] Building CXX object CMakeFiles/decord.dir/src/runtime/dso_module.cc.o
[ 25%] Building CXX object CMakeFiles/decord.dir/src/runtime/str_util.cc.o
[ 27%] Building CXX object CMakeFiles/decord.dir/src/runtime/system_lib_module.cc.o
[ 30%] Building CXX object CMakeFiles/decord.dir/src/runtime/thread_pool.cc.o
[ 33%] Building CXX object CMakeFiles/decord.dir/src/runtime/threading_backend.cc.o
[ 36%] Building CXX object CMakeFiles/decord.dir/src/video/logging.cc.o
[ 38%] Building CXX object CMakeFiles/decord.dir/src/runtime/workspace_pool.cc.o
[ 41%] Building CXX object CMakeFiles/decord.dir/src/video/storage_pool.cc.o
[ 44%] Building CXX object CMakeFiles/decord.dir/src/video/video_interface.cc.o
[ 47%] Building CXX object CMakeFiles/decord.dir/src/video/video_loader.cc.o
[ 50%] Building CXX object CMakeFiles/decord.dir/src/video/video_reader.cc.o
[ 52%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_file_order_sampler.cc.o
[ 55%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_sampler.cc.o
[ 58%] Building CXX object CMakeFiles/decord.dir/src/sampler/smart_random_sampler.cc.o
[ 61%] Building CXX object CMakeFiles/decord.dir/src/sampler/sequential_sampler.cc.o
[ 63%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/filter_graph.cc.o
[ 66%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/threaded_decoder.cc.o
[ 69%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_context.cc.o
[ 72%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_decoder_impl.cc.o
[ 75%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_mapped_frame.cc.o
[ 77%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_parser.cc.o
[ 80%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_stream.cc.o
[ 83%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_texture.cc.o
[ 86%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_threaded_decoder.cc.o
[ 88%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_device_api.cc.o
[ 91%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_module.cc.o
[ 94%] Building CUDA object CMakeFiles/decord.dir/src/improc/improc.cu.o
[ 97%] Linking CUDA device code CMakeFiles/decord.dir/cmake_device_link.o
[100%] Linking CXX shared library libdecord.so
[100%] Built target decord

import decord as de
ctx = de.gpu(0)
shape = (2, 480, 640, 3)
videos = ['example/Javelin_standing_throw_drill.mkv', 'example/flipping_a_pancake.mkv']
interval = 20
skip = 50
vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0)

Traceback (most recent call last):
File "", line 1, in
File "/home/cwq/.local/lib/python3.6/site-packages/decord/video_loader.py", line 32, in init
uri, device_types, device_ids, shape[0], shape[1], shape[2], shape[3], interval, skip, shuffle, prefetch)
File "/home/cwq/.local/lib/python3.6/site-packages/decord/_ffi/_ctypes/function.py", line 175, in call
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File "/home/cwq/.local/lib/python3.6/site-packages/decord/_ffi/base.py", line 62, in check_call
raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [15:55:25] /io/decord/src/video/video_reader.cc:106: CUDA not enabled. Requested context GPU(0).

need your help！

GPU generated batch contains several wrong frames

First of all, thank you for this amazing work, it seems like it will save a lot of time usually spent on generating datasets from video.
However, I have encountered a strange bug when using gpu-accelerated batch generation.
First batch contains every frame I asked for but when I try to get another batch with different frame indexes it contains one (or more) first frame near the final index of the previous batch and only after that there are correct frames until the new final index.

Below I attach code example to reproduce this behavior in Jupyter Notebooks

from decord.bridge.torchdl import to_torch
from decord import VideoReader
from decord import gpu, cpu
import matplotlib.pyplot as plt


vr = VideoReader('path_to_video.mp4', ctx=gpu(0))

first_sample = vr.get_batch([range(2500, 2510, 1)])
first_sample = to_torch(first_sample)

second_sample = vr.get_batch([range(3500, 3510, 1)])
second_sample = to_torch(second_sample)

fig = plt.figure(figsize=(25, 25))
for i in range(first_sample.shape[0]):
    img1 = first_sample[i].cpu().numpy()
    ax = plt.subplot(first_sample.shape[0], 2, i*2+1)
    ax.axis('off')
    plt.imshow(img1)
    plt.title('first_sample')
    
    img2 = second_sample[i].cpu().numpy()
    ax = plt.subplot(first_sample.shape[0], 2, i*2+2)
    ax.axis('off')
    plt.imshow(img2)
    plt.title('second_sample')
plt.show()

Figures

Decord reader does not work with tf.Data.Dataset.from_generator

Hello;

I would like to use tf.Data to process some video frames but I can't get Decord to work with from_generator.
Versions: Decord: 0.3.9 tf: 2.2.0

Here's some minimal code to reproduce the issue. Any help is appreciated.

from decord import VideoReader
from decord import cpu, gpu
import tensorflow as tf

# use whatever video here e.g. http://file.all-free-download.com/downloadfiles/footage/traffic_jam_new_work_street_583.zip
vr = VideoReader('traffic_jam_new_work_street_583.mp4', ctx=cpu(0))
def frame_gen_decord():
    for k in range(len(vr)):
        sample = vr.next().asnumpy()
        yield sample

g = frame_gen_decord()
x = next(g)
h, w, c = x.shape

dataset = tf.data.Dataset.from_generator(
    frame_gen_decord,
    output_types=(tf.int32),
    output_shapes=(tf.TensorShape([h, w, c]))
)

# this throws an error
for frame in dataset.take(1):
    x = frame

Document how to build a manylinux1 wheel

Hi,

For a project I need to use a more updated version of decord than the one present on pypi.org.
Ideally I would like to build a manylinux1_x86_64 wheel and to publish it on our internal pypi repository, but I'm having hard time to compile decord on the official docker image (quay.io/pypa/manylinux1_x86_64).

Can you please share the procedure you use to build wheels?

Thanks in advance!

VideoReader get_frame_timestamp incorrect

It looks like get_frame_timestamp is returning timestamps which are correct but non always consistent with what is returned by __getitem__ (obv this is an issue only in videos where frames are decoded in a different order from the presentation order).

More precisely, it looks like vr.get_frame_timestamp(i) is returning the pts of the i-th frame ordering by dts, while vr[i] would return the i-th frame ordering by pts time.

Example:

> vr.get_frame_timestamp(range(5))                                                             
array([[0.        , 0.03333334],
       [0.03333334, 0.06666667],
       [0.01333333, 0.04666667],
       [0.1       , 0.13333334],
       [0.06666667, 0.1       ]], dtype=float32)

while vr[:5] will correctly return the first 5 frames ordered by presentation time.

This is the output of an ffprobe -show_packets -select_streams v:0 on the file:

[PACKET]
codec_type=video
stream_index=0
pts=0
pts_time=0.000000
dts=-672
dts_time=-0.035000
duration=640
duration_time=0.033333
convergence_duration=N/A
convergence_duration_time=N/A
size=50364
pos=36
flags=K_
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=640
pts_time=0.033333
dts=-671
dts_time=-0.034948
duration=640
duration_time=0.033333
convergence_duration=N/A
convergence_duration_time=N/A
size=24018
pos=50400
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=256
pts_time=0.013333
dts=224
dts_time=0.011667
duration=640
duration_time=0.033333
convergence_duration=N/A
convergence_duration_time=N/A
size=9179
pos=74418
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=1920
pts_time=0.100000
dts=608
dts_time=0.031667
duration=640
duration_time=0.033333
convergence_duration=N/A
convergence_duration_time=N/A
size=18722
pos=83597
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=1280
pts_time=0.066667
dts=1248
dts_time=0.065000
duration=640
duration_time=0.033333
convergence_duration=N/A
convergence_duration_time=N/A
size=8483
pos=102319
flags=__
[/PACKET]

Add bridge for TVM

Current decord has bridges for MXNet, Pytorch, and Tensorflow, which is very handy for training and general uses.

It would be great to also provide a bridge that direct decord to emit video frames as TVM tensors which can be directly digested by TVM modules. This will enhance the applicability of decord in model deployment.

[Query] Using ctx=gpu performance difference. Is this expected?

Hi. Thanks for the amazing repository.

I installed using -DUSE_CUDA option and then tried the example here (https://github.com/zhreshold/decord/blob/master/examples/video_loader.ipynb). I averaged over ten runs using %time Walltime output. The statement I timed was

vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0)

cpu	gpu
53	40

I also tried across various shuffle strategies, but nearly all of them were the same when the same device is used.

Wondering if this is what is expected.

Is it possible to grab screen on time?

Sometimes I would try to input image to model from video or a window screen right on time, which is not recorded on disk. Is it possible with decord?

what(): [12:32:34] /home/vladimir/packages/decord/src/video/nvcodec/cuda_threaded_decoder.cc:332: Check failed: arr.defined()

[12:32:34] /home/vladimir/packages/decord/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: GeForce RTX 2080 Ti
[12:32:34] /home/vladimir/packages/decord/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 440.64, so using our own stream.
terminate called after throwing an instance of 'dmlc::Error'
  what():  [12:32:34] /home/vladimir/packages/decord/src/video/nvcodec/cuda_threaded_decoder.cc:332: Check failed: arr.defined() 

Stack trace returned 10 entries:
[bt] (0) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x9d) [0x7fc6ff0380fd]
[bt] (1) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x45) [0x7fc6ff03843b]
[bt] (2) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(decord::cuda::CUThreadedDecoder::ConvertThread()+0x1bd) [0x7fc6ff0b15b9]
[bt] (3) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(void std::__invoke_impl<void, void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*>(std::__invoke_memfun_deref, void (decord::cuda::CUThreadedDecoder::*&&)(), decord::cuda::CUThreadedDecoder*&&)+0x67) [0x7fc6ff0b4dec]
[bt] (4) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(std::__invoke_result<void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*>::type std::__invoke<void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*>(void (decord::cuda::CUThreadedDecoder::*&&)(), decord::cuda::CUThreadedDecoder*&&)+0x4e) [0x7fc6ff0b302a]
[bt] (5) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>)+0x43) [0x7fc6ff0bb1b9]
[bt] (6) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(std::thread::_Invoker<std::tuple<void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*> >::operator()()+0x2c) [0x7fc6ff0bb15a]
[bt] (7) /home/vladimir/anaconda3/lib/python3.7/site-packages/decord-0.3.6-py3.7.egg/decord/libdecord.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (decord::cuda::CUThreadedDecoder::*)(), decord::cuda::CUThreadedDecoder*> > >::_M_run()+0x1c) [0x7fc6ff0bb12a]
[bt] (8) /home/vladimir/anaconda3/bin/../lib/libstdc++.so.6(+0xc819d) [0x7fc753e2619d]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fc76765d6db]

compiling errors: Makefile:129: recipe for target 'all' failed

hi @zhreshold

Following your tutorial on ubuntu.

the first step (e.g. cmake .. -DUSE_CUDA=0) was completed smoothly. the terminal output is as follows:

lw@lw:~/mmaction_master/third_party/decord/build$ sudo cmake .. -DUSE_CUDA=0
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") 
-- Checking for module 'libavcodec'
--   Found libavcodec, version 56.60.100
-- Checking for module 'libavformat'
--   Found libavformat, version 56.40.101
-- Checking for module 'libavutil'
--   Found libavutil, version 54.31.100
-- Checking for module 'libavfilter'
--   Found libavfilter, version 5.40.101
-- Found FFMPEG or Libav: /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so, /usr/include/x86_64-linux-gnu
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
FFMPEG_INCLUDE_DIR = /usr/include/x86_64-linux-gnu 
FFMPEG_LIBRARIES = /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so 
-- CUDA disabled, no nvdec capabilities will be enabled...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/lw/mmaction_master/third_party/decord/build

but during the second step(e.g. make ) has an error,the output and errors are as follows.

lw@lw:~/mmaction_master/third_party/decord/build$ sudo make 
Scanning dependencies of target decord
[  4%] Building CXX object CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o
[  8%] Building CXX object CMakeFiles/decord.dir/src/runtime/cpu_device_api.cc.o
[ 12%] Building CXX object CMakeFiles/decord.dir/src/runtime/dso_module.cc.o
[ 16%] Building CXX object CMakeFiles/decord.dir/src/runtime/file_util.cc.o
[ 20%] Building CXX object CMakeFiles/decord.dir/src/runtime/module.cc.o
[ 25%] Building CXX object CMakeFiles/decord.dir/src/runtime/module_util.cc.o
[ 29%] Building CXX object CMakeFiles/decord.dir/src/runtime/ndarray.cc.o
[ 33%] Building CXX object CMakeFiles/decord.dir/src/runtime/registry.cc.o
[ 37%] Building CXX object CMakeFiles/decord.dir/src/runtime/str_util.cc.o
[ 41%] Building CXX object CMakeFiles/decord.dir/src/runtime/system_lib_module.cc.o
[ 45%] Building CXX object CMakeFiles/decord.dir/src/runtime/thread_pool.cc.o
[ 50%] Building CXX object CMakeFiles/decord.dir/src/runtime/threading_backend.cc.o
[ 54%] Building CXX object CMakeFiles/decord.dir/src/runtime/workspace_pool.cc.o
[ 58%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_file_order_sampler.cc.o
[ 62%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_sampler.cc.o
[ 66%] Building CXX object CMakeFiles/decord.dir/src/sampler/sequential_sampler.cc.o
[ 70%] Building CXX object CMakeFiles/decord.dir/src/sampler/smart_random_sampler.cc.o
[ 75%] Building CXX object CMakeFiles/decord.dir/src/video/storage_pool.cc.o
[ 79%] Building CXX object CMakeFiles/decord.dir/src/video/video_interface.cc.o
In file included from /home/lw/mmaction_master/third_party/decord/src/video/threaded_decoder_interface.h:10:0,
                 from /home/lw/mmaction_master/third_party/decord/src/video/video_reader.h:10,
                 from /home/lw/mmaction_master/third_party/decord/src/video/video_interface.cc:7:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h: In member function ‘AVPacket* decord::ffmpeg::AutoReleaseAVPacketPool<S>::Allocate()’:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:107:36: error: there are no arguments to ‘av_packet_alloc’ that depend on a template parameter, so a declaration of ‘av_packet_alloc’ must be available [-fpermissive]
             return av_packet_alloc();
                                    ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:107:36: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h: In member function ‘void decord::ffmpeg::AutoReleaseAVPacketPool<S>::Delete(AVPacket*)’:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:111:30: error: there are no arguments to ‘av_packet_free’ that depend on a template parameter, so a declaration of ‘av_packet_free’ must be available [-fpermissive]
             av_packet_free(&p);
                              ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h: At global scope:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:5: error: ‘AVBSFContext’ was not declared in this scope
     AVBSFContext, Deleterp<AVBSFContext, void, av_bsf_free> >;
     ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:28: error: ‘AVBSFContext’ was not declared in this scope
     AVBSFContext, Deleterp<AVBSFContext, void, av_bsf_free> >;
                            ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:48: error: ‘av_bsf_free’ was not declared in this scope
     AVBSFContext, Deleterp<AVBSFContext, void, av_bsf_free> >;
                                                ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:59: error: template argument 1 is invalid
     AVBSFContext, Deleterp<AVBSFContext, void, av_bsf_free> >;
                                                           ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:59: error: template argument 3 is invalid
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:61: error: template argument 1 is invalid
     AVBSFContext, Deleterp<AVBSFContext, void, av_bsf_free> >;
                                                             ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:178:61: error: template argument 2 is invalid
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:5: error: ‘AVCodecParameters’ was not declared in this scope
     AVCodecParameters, Deleterp<AVCodecParameters, void, avcodec_parameters_fre
     ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:33: error: ‘AVCodecParameters’ was not declared in this scope
     AVCodecParameters, Deleterp<AVCodecParameters, void, avcodec_parameters_fre
                                 ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:58: error: ‘avcodec_parameters_free’ was not declared in this scope
     AVCodecParameters, Deleterp<AVCodecParameters, void, avcodec_parameters_fre
                                                          ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:81: error: template argument 1 is invalid
 VCodecParameters, Deleterp<AVCodecParameters, void, avcodec_parameters_free> >;
                                                                            ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:81: error: template argument 3 is invalid
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:83: error: template argument 1 is invalid
 VCodecParameters, Deleterp<AVCodecParameters, void, avcodec_parameters_free> >;
                                                                              ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:181:83: error: template argument 2 is invalid
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h: In function ‘void decord::ffmpeg::ToDLTensor(decord::ffmpeg::AVFramePtr, DLTensor&, int64_t*)’:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:195:9: error: ‘struct AVFrame’ has no member named ‘hw_frames_ctx’
  if (p->hw_frames_ctx) {
         ^
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h: In function ‘decord::ffmpeg::NDArray decord::ffmpeg::CopyToNDArray(decord::ffmpeg::AVFramePtr)’:
/home/lw/mmaction_master/third_party/decord/src/video/ffmpeg/ffmpeg_common.h:249:12: error: ‘struct AVFrame’ has no member named ‘hw_frames_ctx’
     if (p->hw_frames_ctx) {
            ^
CMakeFiles/decord.dir/build.make:296: recipe for target 'CMakeFiles/decord.dir/src/video/video_interface.cc.o' failed
make[2]: *** [CMakeFiles/decord.dir/src/video/video_interface.cc.o] Error 1
CMakeFiles/Makefile2:75: recipe for target 'CMakeFiles/decord.dir/all' failed
make[1]: *** [CMakeFiles/decord.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

environment: ubuntu16.04 ffmpet4.1.3 cmake3.15.2 g++5.4.0

Compatibility with PyTorch DataLoader

Hi,

I tried to combine decord's GPU decoding with PyTorch's DataLoader. The code snippet looks like the following:

class VideoDataSet(torch.utils.data.Dataset):
    ...
    def __getitem__(self, index):
        decord.bridge.set_bridge('torch')
        vr = decord.VideoReader(self.video_list[index], ctx=decord.gpu(int(os.getenv('LOCAL_RANK'))))
        frames = vr.get_batch(indices)
        return frames

if __name__ == '__main__':
    torch.multiprocessing.set_start_method("spawn")  # avoid CUDA initialization error
    dataset = VideoDataSet("kinetics/K700/video_raw", "kinetics/K700/k700_train_video.txt")
    loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=16, num_workers=16)

The following information shows up multiple times in the training log but no input data is consumed.
Reducing num_workers to 1 or even 0 doesn't change the situation.

[15:34:05] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:07] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:12] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:12] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:13] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:14] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:14] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.
[15:34:14] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Tesla V100-SXM2-32GB
[15:34:14] /decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 418.116, so using our own stream.

decord's GPU decoding, when directly used without Dataset or DataLoader, works well in the same environment.