sistemas-multimedia / mrvc Goto Github PK

A scable-in-resolution video codec

License: Other

Makefile 0.01% Python 4.87% Shell 0.18% Jupyter Notebook 94.95%

mrvc's Introduction

Multi-Resolution Video Coding (MRVC)

Welcome to this project!

This an implementation of a novel video transform based on Motion Compensation (MC) and the Discrete Wavelet Transform (DWT).

Contents of this repo:

demos: some OpenCV demos.
distribute: use of Distutils for distributing this software (unmaintained).
docs: white paper.
images: some test images.
src: source files.
tests: test scripts.
tools: tools.

mrvc's People

Contributors

Stargazers

Watchers

Forkers

jalbladewing voyagerx3 victorrodri jmmateo14 juanrdzbaeza lezlychavira normanditirado afr997 smm484 urdi10 cmmirandasarmiento davidbejarcaceres fgv113 gitter-badger dexfroz dingwenguo benjimick11 mgr33 mgr1984 snakedoc12 jfrz38 nabelhm miguelons11 victorconka maplazaube marinaalct josefreak95 gitusser soloelectronicos jo4nymj fga432 ingcarlosefren mpmanuel98 crn565 jpgarciaortiz rtyui57 gmc456 alejandrofrgu victorduclaux jokafi emp718 irenedl emb634 elisamcsp kraidax ismailmz

mrvc's Issues

(Week 6) Priorization of the visual content in the motion compensated (H) subbands

Objectives

Understand how to prioritize information in the MCDWT domain.

Methodology

In groups, implement and test the encoder described below (the decoder should be easily derivable from the encoder). After incorporating it to MCDWT, determine its performance as a compressor for different number of sent coefficients. Write a report of your work and present it in class.

Activities

Modify MCDWT.py to incorporate to implement the coder in the forward butterfly and the decoder in the backward butterfly. You will need to implement the algorithm described here.
Compress the previous (truncated) sequence for different numbers of sent coefficients.
Compute the compression ratio.
Write a report (use preferably Markdown) describing your experiments and your conclusions.

Evaluation of the HEVC-Intra image compressor

Determining a good quantizer

In digital signal coding, a quantizer maps input M possible values of the signal (samples in the case of a digital signal) into N possible output values of the quantized signal (again, samples), being M<N. More information here.

Objectives:

Understand the quantization of signals and digital signals.
Implement different quantizers for MCDWT and evaluate them.

Procedure:

Implement a set of uniform scalar quantizers (midtread, midrise and deadzone) in the script quantize.py. Create a Jupyter notebook, documenting the file size and the distortion obtained when the subbands generated by MCDWT are quantized and compressed with the PNG image compression format, for the different quantizers implemented. Evaluate them using the same paremeters (videos, number of spatial levels, number of temporal levels, etc.) that in the previous issue. Notice that modifying the quantization step, different points of the following R/D curve should be obtained:

Distortion
     |*
     | *
     |  *
     |    *
     |        *
     |                 *
     |                                 *
     |                                                                                           *
     +--------------------------------------------------------- Rate
     <- larger quantization step   smaller quantization step- >

(Week 7) Priorization of the visual content in the I-type H subbands

Objectives

Understand how to prioritize information in the 2D wavelet domain.

Methodology

In groups, implement and test the encoder described below (the decoder should be easily derivable from the encoder). After incorporating it to MCDWT, determine its performance as a compressor for different values of the quantization parameter (lambda). Write a report of your work and present it in class.

Activities

Modify MCDWT.py to incorporate to implement the coder in the forward butterfly and the decoder in the backward butterfly. You will need to implement the algorithm described here.
Implement a uniform scalar quantizer and apply it to an MCDWT output.
Compress the previous (quantized) sequence.
Compute the compression ratio.
Write a report (use preferably Markdown) describing your experiments and your conclusions.

(Week 8) Block-based bidirectional motion estimation

Objectives

Understand the basics of the Motion Estimation (ME) and Motion Compensation (MC) processes.
Compare the performance of a block-based ME system to the performance of the dense optical-flow estimator already implemented.

Methodology

In groups, research the links provided below (and in general, use all the information that you can be found on the Internet) to understand how to implement a bidirectional block-based ME/MC system for video. Develop an implementation (as efficient as possible) and compare it to the current estimator of the MCDWT project. Write a report and present it in class.

Activities

Research on the Internet for packages/libraries that provide MEs (a good candidate could be the used in this repo) (see src/motion_estimate.cpp).
Modify MCDWT.py to incorporate the block-based ME, coding it as an option to the current estimator (dense optical-flow).
Experimentally, compare both estimators (block-based and optical-flow-based). Use the entropy of the output of the transform as a figure of merit.
Write a report (use preferably Markdown) describing your experiments and your conclusions.

Adapting the `src/IO` stuff for handling multiple spatial resolutions

In the current implementation, the different subbands (LL, LH, HL, and LL) are named <prefix>_LL, _LH, _HLand_HH. This notation hinder the implementation of MDWT+MCDWT, forcing to rename the LL subbands from ???_LLto???` (? = digital digit) in each iteration.

To avoid this, we propose to use a different notation for the decompositions. An example:

Current implementation:

      DWT           iDWT
000 ------> 000_LL ------> 000
            000_LH
            000_HL
            000_HH

Proposed implementation:

      DWT           iDWT
000 ------> LL/000 ------> 000
            LH/000
            HL/000
            HH/000

These modifications should be "harcoded" in the src/IO sources.

Bit allocation in 2D decompositions

Remember:

+-----------+   DWT   +----+----+
|           | ------> | LL | LH |
|   image   |         +----+----+
|           | <------ | HL | HH |
+-----------+  iDWT   +----+----+
                     decomposition

Bit allocation (see Section 7.1.2 (Quantization)) defines, given a bit budget R, the quantization step that must be used in each subband (LL, LH, HL and HH) to minimize the distortion (MSE, for example).

If the transform (DWT) is orthogonal, this optimization problem can be solved considering that the distortion generated by the quantization of the subbands is additive (for example, the quantization error generated by the quantization of the subband LL "Q(LL)" and the subband LH "Q(LH), added "Q(LL)+Q(LH)" is equal to the quantization error generated by the quantization of both subbands at the same time "Q(LL,LH)"). Thus, if we know the RD (Rate/Distortion) curve of each subband (that is, the contribution of each subband to the quality of the reconstructed (quantized) image), the optimal quantization step for each subband is those q_step that produces as much R bits (in total, for the 4 subbands) and none of these bits decreases the distortion in an amount that corresponds with a slope in the curve that is smaller that \lambda. See this notebook.

Evaluation of the JPEG2000 image compressor

(Week 4) Computation of the energy of the wavelet coefficients

Objectives

Understand the importance of the wavelet coefficients in the quality of the reconstructions after the temporal synthesis.
Know how to incorporate this information into a video codec.

Methodology

In groups, research the links provided below (and in general, use all the information that you can be found on the Internet) to understand how to determine the contribution of every wavelet coefficient in the energy of the reconstruction. For this, design an experiment where for every coefficient X, set X=255 and set to 0 the rest of the coefficients, perform an inverse MCDWT transform, and quantify the energy of the reconstruction. Generate a sequence of decompositions where each "coefficient" represents such energy.

Activities

Research on the Internet looking for orthogonality and orthonormality concepts in the Discrete Wavelet Transform. Research also about orthogonal and biorthogonal transforms and image compression.
Implement a script to compute the contribution of each wavelet coefficient in the 2D-DWT using the method described before, depending on: (1) the number of spatial resolution levels and, (2) the selected wavelet. The number of spatial resolution levels should be provided as input parameters.
Experimentally, compare the performance of two MCDWT decoders, one using the coefficients contributions and the other not using such information. To do that, quantize the coefficients (provide this as an input parameter), perform the corresponding reconstructions and measure the differences to the original video sequence (use SNR).
Write a report (use preferably Markdown) describing your experiments and your conclusions.

(Week 2) The MCDWT video codec and its use

Objectives

Include in the documentation of MCDWT (docs directory) a document explaining what is MCDWT and how it should be used to transform a video using MCDWT from scratch.
Create a shell script (preferably for Bash) with the commands that do that.

Methodology

Revise the information provided below and write in Markdown a user manual explaining how to transform a video using MCDWT.

Activities

Download a video from here.
Convert the video into a sequence of 16-bit PNG images using the notation videoname_???.png where ??? is a 3 digits number starting a 000. Use the scripts extract_images.sh and add_offset.py as reference. See also test_sum_sub.sh. Notice that the pixel value 0 should be stored in the PNG files as the 32768-128 value. Develop also the script(s) for reconstructing the original image sequence using the transformed one. Use only parameters in your script(s) and provide a helpful description of these. Parameters should control both, (1) the number of spatial resolutions (levels in the Laplacian Pyramid) and (2) the number of temporal resolutions (which defined the GOP size). The number of transformed images should be also an input parameter.
Using an ASCII editor, write your user manual (in Markdown) and your shell script(s), and perform a PR including all files.

Implementation of a context predictor for arithmetic coding

The low-resolution residue images can serve as a predictor for the higher-resolution residue images. Implement a context predictor based on this idea.

(Week 3) Adaptive motion compensation based on the distortion of the prediction error

Objectives

Understand the importance of the generation of predictions in bidirectional motion compensation.
Determine the increase in performance of the improved predictor.

Methodology

In groups, implement an image predictor based on the prediction error (see theory below). After incorporating it to MCDWT, determine its performance to decrease the energy of the residuals (prediction errors). Write a report of your work and present it in class.

Activities

Modify MCDWT.py to incorporate the generation of predictions based on the distortion of the predictions (currently based on a simple average). You will need to implement Eq. 4 described here.
Measure the variation of the entropy in the output of the transform (compare it to the current implementation where the prediction if a simple average of the forward and backward predictions). In this case, it is preferable to have both predictions implemented (the simple average and the weighted average) and using a (command-line) parameter, chose the desired predictor.
Write a report (use preferably Markdown) describing your experiments and your conclusions.

Updating MCDWT.py with the new directory management

(TFM) Subband prediction using TensorFlow

TensorFlow allows to implement efficient artificial neural networks to find approximate solutions to complex computational problems such as those related to machine learning. Recently, as can be found here, TensorFlow has been used to compute the Dense Optical Flow.

This issue proposes to implement a motion estimator for MCDWT based on the developments previously described. The main goal here is to minimize the running time of the estimation leveraging on the availability of GPU resources.

Evaluation of the JPEG image compressor

Objectives:

Understand the operation of the JPEG image compressor.
Implement a rate-control algorithm considering the contribution o the MCDWT subbands.
Analyze the performance of the MCDWT+JPEG video compressor.

Procedure:

Create a Jupyter notebook to document the implemented rate-control algorithm for MCDWT+JPEG, and to describe the experiments performed for evaluating it. As in the previous issue, range your study for different:

GOP sizes, and
number of generated spatial resolutions,

starting both at 1. The notebook should be self-contained (all the information needed for repeating the experiments should be in it) and allow to run without any (human) interaction, except for the initial launching. Select the testing videos from Xiph.org Test Media with two different criteria:

The amount of motion in the video.
The spatial resolution of the video.

Other method to perform the calculation

I have been reviewing information about it and the following document indicates another method to perform the calculation
https://pdfs.semanticscholar.org/6ab3/a98c57a54b0e81d643d101839bae250ad8ee.pdf

Considering "intra" pixels

When the prediction using the neighbor images is bad (something that it is already known), a prediction using the low-resolution version of the predicted image should be taken into consideration.

Evaluation of the H.264/AVC-Intra image compressor

Using hierarchical motion estimation

As an alternative to the Farneback's dense optical flow estimation, a block-based hierarchical spiral motion estimator should be tested.

(Week 1) Use of the Fork and Branch Git Workflow at GitHub

Objectives

Learn how to use GitHub and the fork and branch Workflow.

Methodology

Head over the links provided below.

Activities

Create a personal GitHub account.
Fork the repository https://github.com/vicente-gonzalez-ruiz/fork_and_branch_git_workflow.
Fork the repository https://github.com/Sistemas-Multimedia/MCDWT.
Fork the repository https://github.com/Sistemas-Multimedia/Sistemas-Multimedia.github.io.
Improve some aspects of at least one of the previous repositories (maybe in the available documentation) and perform a Pull Request (PR).

(Week 5) Visual evaluation of the effects of quantization

Objectives

Understand the importance of quantization in image coding.
Measure the visual impact of the quantization error in MCDWT.

Methodology

In groups, research the links provided below (and in general, use all the information that you can be found on the Internet) to see how quantization affects both, the quality of the reconstructions and the compression ratio. Next, implement a subband quantizer that inputs an image and outputs a quantized image, after providing the quantization step as a parameter. Use this quantizer to degrade the quality of the output of MCDWT applied to a sequence of images, in two cases: (1) when only the H subbands are quantized, and (2) when all the subbands (L included) are quantized. Visually determine the impact of quantization in both cases.

Activities

Research on the Internet looking for quantization in image compression.
Implement a script to quantize (PNG) images (implement an uniform scalar quantizer). The name of the file with the image and the quantization step should be provided as input parameters. The name of the quantized image is the only output.
Compute an MCDWT of T temporal levels and K spatial levels.
Quantize the H subbands.
Compute the inverse MCDWT and visualize the reconstruction.
Compare (visually) the reconstruction and the original sequence.
Repeat this procedure for several T and K values.
Repeat the procedure when all the subbands (L and H) are quantized.
Write a report (use preferably Markdown) describing your experiments and your conclusions.

Evaluation of each coding path in MCDWT

The different alternative in each stage of MCDWT (ME, entropy coding) should be tested, in any combination. Implement a set of experiments for evaluating the different combinations.

Evaluation of the PNG image compressor

Objectives:

Understand the operation of the MCDWT video codec.
Understand the operation of the PNG image compressor.
Analyze the performance of the MCDWT+PNG video compressor.

Procedure:

Create a Jupyter notebook, documenting the compression ratio obtained when the subbands generated by MCDWT are compressed with the PNG image compression format (notice that this is the default image compressor already used by MCDWT), for different:

GOP sizes, and
number of generated spatial resolutions,

The amount of motion in the video.
The spatial resolution of the video.

Bit Allocation in MCDWT

The procedure of determining the optimal quantization steps for a single decomposition can be extrapolated to the quantization of a sequence of images transformed by the MDWT and the MCDWT. In this case, (and always supposing that only 2 spatial resolution levels will be available, that is the same that to say that MDWT has been applied only one time) instead of considering 4 subbands in the optimization, you must consider all the subbands of the optimized GOP, that are (except for the first GOP) 4*2^T, where T is the number of iterations of the MCDWT.

For example, in the figure:

 GOP 0       GOP 1
------- ---------------
 img 0           img 2
+--+--+         +--+--+
|LL|LH|         |LL|LH|
+--+--+         +--+--+
|HL|HH|         |HL|HH|
+--+--+         +--+--+
   a               c
         img 1
        +--+--+
        |LL|LH|
        +--+--¡
        |HL|HH|
        +--+--+
           ~
           b

there are 3 images, to which the MDWT and the one-iterations (T=1) MCDWT has been applied (a, b and c are the alias of the images in the forward MCDWT butterfly). As can be seen, there are 4*2^1 = 8 subbands in the GOP 1 (the first GOP always has only one image). So, the quantization step optimizer should consider 8 RD curves (instead of only 4) in the GOP 1.

sistemas-multimedia / mrvc Goto Github PK

mrvc's Introduction

Multi-Resolution Video Coding (MRVC)

mrvc's People

Contributors

Stargazers

Watchers

Forkers

mrvc's Issues

Objectives

Methodology

Contents

Activities

Objectives:

Procedure:

Objectives

Methodology

Contents

Activities

Objectives

Methodology

Contents

Activities

Objectives

Methodology

Contents

Activities

Objectives

Methodology

Contents

Activities

Objectives

Methodology

Contents

Activities

Objectives:

Procedure:

Objectives

Methodology

Contents

Activities

Objectives

Methodology

Contents

Activities

Objectives:

Procedure:

Recommend Projects

Recommend Topics

Recommend Org