beeldengeluid / dane-video-segmentation-worker Goto Github PK

Python 95.77% Shell 1.68% Dockerfile 2.55%

dane-video-segmentation-worker's Introduction

dane-video-segmentation-worker

Running Scenedetect to detect shots and select keyframes. Including code for extracting keyframes, extracting audio, and generating spectrograms.

Installation

Use Poetry to install this project into a virtualenv.

poetry install

Installing python-opencv in a virtualenv, and thus not as a system package, could cause certain shared-objects to be missing. So far it seems libgl1 might be missing this way. Install it using:

apt-get install libgl1

To make sure the unit-test work as well

apt-get install ffmpeg

Run in local Python virtualenv

For local testing, make sure to put a config.yml in the root of this repo:

cp ./config/config.yml config.yml

Then make sure to activate your virtual environment:

poetry shell

Then run ./scripts/check-project.sh to:

linting (Using flake8)
type checking (Using mypy)
unit testing (Using pytest)

Run test file in local Docker Engine

This form of testing/running avoids connecting to DANE:

No connection to DANE RabbitMQ is made
No connection to DANE ElasticSearch is made

This is ideal for testing:

main_data_processor.py, which uses VISXP_PREP.TEST_INPUT_FILE (see config.yml) to produce this worker's output
I/O steps taken after the output is generated, i.e. deletion of input/output and transfer of output to S3

docker build -t dane-video-segmentation-worker .

Check out the docker-compose.yml to learn about how the main process is started. As you can see there are two volumes mounted and an environment file is loaded:

version: '3'
services:
  web:
    image: dane-video-segmentation-worker:latest  # your locally built docker image
    volumes:
      - ./data:/data  # put input files in ./data and update VISXP_PREP.TEST_INPUT_FILE in ./config/config.yml
      - ./config:/root/.DANE  # ./config/config.yml is mounted to configure the main process
    container_name: visxp
    command: --run-test-file  # NOTE: comment this line to spin up th worker
    env_file:
      - s3-creds.env  # create this file with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to allow boto3 to connect to your AWS S3 bucket (see OUTPUT.S3_* variables in config.yml)
    logging:
      options:
        max-size: 20m
    restart: unless-stopped

There is no need to update the docker-compose.yml, but make sure to:

adapt ./config/config.yml (see next sub-section for details)
create s3-creds.env to allow the worker to upload output to your AWS S3 bucket

Config

The following parts are relevant for local testing (without connecting to DANE). All defaults are fine for testing, except:

VISXP_PREP.TEST_INPUT_FILE: make sure to supply your own mp4 file in ./data
S3_ENDPOINT_URL: ask your DANE admin for the endpoint URL
S3_BUCKET: ask your DANE admin for the bucket name

FILE_SYSTEM:
    BASE_MOUNT: /data
    INPUT_DIR: input-files
    OUTPUT_DIR: output-files/visxp_prep
PATHS:
    TEMP_FOLDER: /data/input-files
    OUT_FOLDER: /data/output-files
VISXP_PREP:
    RUN_KEYFRAME_EXTRACTION: true
    RUN_AUDIO_EXTRACTION: false
    SPECTROGRAM_WINDOW_SIZE_MS: 1000
    SPECTROGRAM_SAMPLERATE_HZ:
        - 24000
    TEST_INPUT_FILE: /data/testob-take-2.mp4
INPUT:
    DELETE_ON_COMPLETION: False  # NOTE: set to True in production environment
OUTPUT:
    DELETE_ON_COMPLETION: True
    TRANSFER_ON_COMPLETION: True
    S3_ENDPOINT_URL: https://your-s3-host/
    S3_BUCKET: your-s3-bucket  # bucket reserved for 1 type of output
    S3_FOLDER_IN_BUCKET: assets  # folder within the bucket

Relevant links

Also see: https://stackoverflow.com/questions/64664094/i-cannot-use-opencv2-and-received-importerror-libgl-so-1-cannot-open-shared-obj

https://docs.opencv.org/4.x/d2/de6/tutorial_py_setup_in_ubuntu.html

dane-video-segmentation-worker's People

Contributors

Watchers

dane-video-segmentation-worker's Issues

Output both 24k and 48k spectograms

Teng has trained two models. He is going to check which model is most appropriate. For now, export spectogram for both audio framerates.

Prevent cases with too many (unnecessary) keyframes

Some shows, amongst which DwDD, seem to have way to many keyframes (or shots?)

E.g. http://localhost:5304/tool/similarity?cid=daan-catalogue-aggr&id=2101608040031896331&assetId=AANSLAGENOPAM-HRE0001842F&startTime=6237.88&ak-time=261120

Improve spectrograms

Reconsider duration, imrpove audio model, etc

improve images

Run hecate

Have worker use hecate to apply shot detection

Use timestamps in video-url to determine which shots to filter out

Output:

shot boundaries (timestamps)
keyframe time codes
keyframes (optional, based on config. Either png or jpg)
Thumbnails (optional, based on config)

At running the second worker, we noticed that the dimensionality of the spectograms did not match.
Teng has shared a different implementation of spectogram extraction (https://github.com/beeldengeluid/dane-visual-feature-extraction-worker/tree/main/spectrogram_example2), this is the code that he actually used for his paper.

Implement this in the video-segmentation worker: replace code in spectogram.py and check if the output does match the expectations now.

Run audio processing

Have worker run audio proccessing util to obtain audio spectogram

Manage edge-cases for spectograms (out of scope)

When a keyframe is very close to the beginning or end of a video, a (symmetrical) one second window of audio cannot be creared. Moreover, when a keyframe is close to a shot boundary, a one second window may be inappropriate. However, the feature extraction model requires homogeneous, one second-based spectograms.

We discussed several solutions:

shift the keyframe timestamp away from the boundary (and extract both spectogram and keyframe at that moment in time)
pad the extracted audio with repeated frames. This can be mirrorred padding (playing the edge frames again in reverse) or circular padding (repeat the last frames in he beginning, or the first frames at the end). The last approach is expected to be least harmful to the spectogram.
discard all keyframes that are close to the edges alltogether

The second approach (apply padding, in a circular fashion) is deemed most appropriate.
However, due to time constraints we stick with the last approach (discarding edge frames) at least for the video boundaries, as a minimum effort solution.

NB: the same holds for the annotations a researcher uses as query for similarity search!

Use scenedetect for keyframe extraction

Scenedetect can extract keyframes, which reduces overall complexity

Determine appropriate way to include hecate

Hecate is C++ code. I can think of several options to include it in a worker:

Install everything in one docker. Use Python subprocess to call Hecate executionable from Python (this is roughly the approach (minus containerization) taken in https://github.com/beeldengeluid/dane-shot-detection-worker
Install Hecate in a separate docker, and use docker-py (or some other approach) to call it from Python
Use C bindings to run Hecate directly from Python

What is the most elegant way to tackle this problem? @jblom @MartijnBNG @gb-beng any thoughts on this? Did you encounter this before, and how was it solved?

Extract wav+spectogram image

For usage in the frontend, extract 1 second of audio as wav file and the corresponding spectogram as image.

Tweak scenedetect

Shot length, prevent almost duplicate keyframes, enough keyframes per time interval

Manage edge-cases for spectograms (quick & dirty)

As explained in #12 , discard all keyframe indices that are too close to the edge of the video (or shot)

Containerize

Create dockerfile (& image) to include both installation of hecate and dependencies for audio processing

Improve documentation

it is not so clear from the README how to run the local test
there are 2 config.yml files, this is confusing

Validate worker 1 output

Use test video to check that we output the appropriate format:
Try to reproduce Tengs output

Check consistency of time

Check whether visual and audio are appropriately aligned, no time offset issues

Drop audio processing

Make audio processing a configurable option (and set to false for current processing)

Prevent spurious keyframes

Prevent spurious keyframes (e.g. set minimal shot length)

Use Scenedetect's option to limit amount of scenes:

https://www.scenedetect.com/docs/latest/cli.html

-m TIMECODE, --min-scene-len TIMECODE
Minimum length of any scene. TIMECODE can be specified as number of frames (-m=10), time in seconds (-m=2.5), or timecode (-m=00:02:53.633).

Default: 0.6s

Restore to a runnable state

Currently, I get all kinds of errors trying to run the code either locally or in a container. Need to fix those before I can do anything relevant.

Also, do some cleanup:
Local installation via poetry and in Dockerfile through pip/requirements.txt obscures the process unnecessarily
Also, there are lots of references to Hecate, both in the code and the readme.

Replace! Hecate

Shots seem overly strict, keyframes quite generous.

ffmpeg can also do scene detection
Philo & A-team have looked into tools for keyframes & scenes > find report
DAAN also stores keyframes (low res) - use timestamps?