wzk1015 / video-bgm-generation Goto Github PK

View Code? Open in Web Editor NEW

280.0 280.0 32.0 12.53 MB

Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

Home Page: https://wzk1015.github.io/cmt/

License: MIT License

Python 98.80% Shell 0.05% Jupyter Notebook 1.15%

video-bgm-generation's Introduction

I’m currently a second-year Ph.D. student at Shanghai Jiao Tong University and an intern at Shanghai AI Laboratory.
My research interests include computer vision, music generation, and deep learning.
You can contact me via wangzhaokai [at] sjtu [dot] edu [dot] cn.

一些仓库介绍

发表论文
- CNMT：Confidence-aware Non-repetitive Multimodal Transformers for TextCaps (AAAI 2021)
- CMT：Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)
- SymMV：Video Background Music Generation: Dataset, Method and Evaluation (ICCV 2023)
- PIIP：Parameter-Inverted Image Pyramid Networks
研究笔记
- COCO-leaderboard：COCO目标检测leaderboard笔记
有趣的游戏和工具
- Sanguosha：文字版三国杀
- GPT-turtlesoup：ChatGPT实现AI海龟汤，GPT出题、当玩家、当裁判
- Pokemon-Types-PageRank：宝可梦属性排名，使用PageRank算法
- wordle-solver：用多种方式求解wordle问题
- HRM-architecture：基于人力资源机器游戏的CPU、编译器等架构设计
- wzk-Game-Collection：python小游戏全集，飞行棋、扫雷、德州扑克、2048、五子棋等
- Arxiv-Assistant: 自动获取每日的arxiv新论文列表、使用GPT筛选、发邮件提醒
- Scraper：小红书、微信公众号、马蜂窝爬虫
- luna：简单的版本管理系统
- hahaha：自动生成表情包
- wzk-pypi-package：自己的python包，小游戏、爬虫等娱乐性质代码合集
大学课程相关
- BUAA-CS-course-notes：北航计算机专业课代码及期末复习笔记，包含很多课的代码
- BUAA-getscore：北航查分小工具
- PhysicsExperiment：基物实验数据计算程序
- pku-nsd-double-major：北大国发院经双课程复习资料

video-bgm-generation's People

Contributors

Stargazers

Watchers

video-bgm-generation's Issues

LPD 5 midi2numpy issue

Hi, I'm so interested in your work and successfully train it in your proposed 'lpd_5_prcem_mix_v8_10000.npz' and 'loss_8_params.pt'.
Moreover, I'd like to have a try on new datasets and firstly I conducted "midi2numpy_mix.py" script on original LPD-5 cleaned midi files, but I just obtained the problem as follows:

python midi2numpy_mix.py --midi_dir /home/video-bgm-generation-main/dataset/clean_midi/Zero --out_name data.npz
0%| | 0/39 [00:00<?, ?it/s]
Traceback (most recent call last):
File "midi2numpy_mix.py", line 243, in
midi2numpy(id_list)
File "midi2numpy_mix.py", line 196, in midi2numpy
midi = MIDI(id)
File "midi2numpy_mix.py", line 149, in init
self.bars = self._get_bars()
File "midi2numpy_mix.py", line 156, in _get_bars
bars[note.bar].append(note)
AttributeError: 'Note' object has no attribute 'bar'

Thus I feel a little confused about how to exactly get your propose npz training data.
Looking forward to your help! Thank you~~

Unavailable Dataset Link

Hi, the google drive link of the dataset file lpd_5_prcem_mix_v8_10000.npz is currently unavailable. Hope this could be fixed, thanks!

Pair dataset

Hi author, I wonder if there is any paired dataset for video background music generation?

Objective Evaluation

Hello author， I adopt “https://github.com/slSeanWU/MusDr” to evaluate the generated MIDI.
But after I process the midi file to get the pickle file, the MusDr warehouse also needs the corresponding csv file to get the objective evaluation value. How do you convert the processed pickle file to csv?

CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

你好，我在这个代码的基础上，增加了两个音乐属性的控制。我重新制作了数据集，使得训练集的形状从（3000,9999,9）变成了（3000,9999,11）。在模型中，我也增加了对应的loss和emb_size的维度，但是训练的时候报了这个错误。排查了一下，cuda版本和gpu memory没问题，似乎是tokenizer的问题。我想问一下这个问题该怎么解决？谢谢！

Missing file

Dear author, where is the file 'metadata.json'? It is needed in the inference step.

can you guys create a google colab for this?

would be amazing to try this using google colab.

Google colab error

Hello author, I've tried to run the colab you shared, but when I met this part, there's some problem like the picture. What's wrong with it? I just follow the step and run.
Sincerely, William.

Different length of input and output

Hello auther, I have successfully run your code with my video and gotten some output.
I run completely according to the tutorial. However, the midi output has 313s music is different from the length of my input 8s.

Is there someting wrong with it?

您好！我按照教程成功跑通您的程序了。但是我却获得了313s的音频，与我输入的8s视频不相符。
我看到模型中生成音乐的部分，cur_vlog应该代表生成的音频帧数，len_vlog代表总帧数。但最终生成的音频长度却远长于len_vlog，是否是我操作的问题导致的？

Unavailable dataset link

The lpd-5-cleansed dataset link
#Download the lpd-5-cleansed dataset from HERE
is unavailable. Hope that can be fixed, thanks!

Error(s) in loading state_dict for DataParallel

Hi！I have an error when running model to generate .mid (use the mm21_py3 environment) . I did not change the code to train except using epoch==1 and batchsize == 1. I do have to set batchsize to 8? Could you help me to solve this problem ??~

Traceback (most recent call last):
  File "gen_midi_conditional.py", line 102, in <module>
    generate()
  File "gen_midi_conditional.py", line 58, in generate
    net.load_state_dict(torch.load(path_saved_ckpt))
  File "/root/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
        size mismatch for module.init_emb_genre.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([7, 64]).
        size mismatch for module.init_emb_instrument.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([6, 64]).

TypeError: 'NoneType' object is not callable

It runs well with the code in 'Quick-start' from https://github.com/idiap/fast-transformers.
But I meet this error when running the 'train.py' with no modification:

name: train_default
args Namespace(batch_size='1', epochs=200, gpus=None, lr=0.0001, name='train_default', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz')
num of encoder classes: [  18    3   18  129   18    6   20  102 4865] [7, 1, 6]
D_MODEL 512  N_LAYER 12  N_HEAD 8 DECODER ATTN causal-linear
>>>>>: [  18    3   18  129   18    6   20  102 4865]
DEVICE COUNT: 1
VISIBLE: 0
n_parameters: 39,006,324
    train_data: dataset
    batch_size: 1
    num_batch: 3039
    train_x: (3039, 9999, 9)
    train_y: (3039, 9999, 9)
    train_mask: (3039, 9999)
    lr_init: 0.0001
    DECAY_EPOCH: []
    DECAY_RATIO: 0.1
Traceback (most recent call last):
  File "train.py", line 226, in <module>
    train_dp()
  File "train.py", line 169, in train_dp
    losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 482, in forward
    return self.train_forward(**kwargs)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 450, in train_forward
    h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 221, in forward_hidden
    encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 138, in forward
    x = layer(x, attn_mask=attn_mask, length_mask=length_mask)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 81, in forward
    key_lengths=length_mask
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward
    key_lengths
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward
    values
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear
    V_new = causal_dot_product(Q, K, V)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/causal_product/__init__.py", line 48, in forward
    product
TypeError: 'NoneType' object is not callable

Can't download the right .sf2 file from the way mentioned in CMT.ipynb

I failed to download the file in the way mentioned in CMT.ipynb

!gsutil -m cp gs://cmt/loss_8_params.pt /content/video-bgm-generation/exp/
!gsutil -m cp gs://magentadata/soundfonts/SGM-v2.01-Sal-Guit-Bass-V1.3.sf2 /content/video-bgm-generation/

I found loss_8_params.pt from google driver mentioned in README.md, but I failed to find SGM-v2.01-Sal-Guit-Bass-V1.3.sf2 from google.
Maybe I used the gsutil in the wrong way, do you mind apply a google link of SGM-v2.01-Sal-Guit-Bass-V1.3.sf2 to me, I will appreciate it very much

Dense and repeated tokens

Hello author, thanks for your comprehensive doc and I have successfully generated a BGM for the same video as demo1.
However, the note is very dense during the inference procedure. That's, the model will repeatedly generate notes for a same tick (beat) and almost every tick has more than one notes.
I would like to consult why and how can I avoid this issue. Maybe there is something wrong with my operation.
Some inference results are as follows and the final music is very fast and intense.
[17 1 9 0 0 0 0 3 16]
[ 1 1 8 0 0 1 2 3 16]
[ 0 2 0 36 1 1 0 3 16]
[ 0 2 0 43 1 1 0 3 16]
[ 1 1 8 0 0 2 4 3 16]
[ 0 2 0 42 16 2 0 3 16]
[ 0 2 0 57 7 2 0 3 16]
[ 0 2 0 61 8 2 0 3 16]
[ 0 2 0 66 9 2 0 3 16]
[ 1 1 8 0 0 4 1 3 16]
[ 0 2 0 30 7 4 0 3 16]
[ 3 1 7 0 0 1 1 3 18]
[ 0 2 0 43 1 1 0 3 18]
[ 5 1 6 0 0 1 2 3 20]
[ 0 2 0 39 1 1 0 3 20]
[ 0 2 0 43 1 1 0 3 20]
[ 5 1 6 0 0 2 2 3 20]
[ 0 2 0 61 5 2 0 3 20]
[ 0 2 0 66 3 2 0 3 20]
[ 7 1 5 0 0 1 1 3 22]
[ 0 2 0 43 1 1 0 3 22]
[ 7 1 5 0 0 2 1 3 22]
[ 0 2 0 69 3 2 0 3 22]
[ 7 1 5 0 0 4 1 3 22]
[ 0 2 0 30 3 4 0 3 22]
[ 9 1 4 0 0 1 2 4 24]
[ 0 2 0 36 1 1 0 4 24]
[ 0 2 0 43 1 1 0 4 24]
[ 9 1 4 0 0 2 4 4 24]
[ 0 2 0 45 9 2 0 4 24]
[ 0 2 0 57 4 2 0 4 24]
[ 0 2 0 61 9 2 0 4 24]
[ 0 2 0 64 9 2 0 4 24]
[ 9 1 4 0 0 4 1 4 24]
[ 0 2 0 33 7 4 0 4 24]
[ 9 1 3 0 0 1 2 4 24]
[ 0 2 0 36 1 1 0 4 24]
[ 0 2 0 43 1 1 0 4 24]
[ 9 1 3 0 0 2 1 4 24]
[ 0 2 0 45 9 2 0 4 24]
[ 9 1 3 0 0 4 1 4 24]
[ 0 2 0 33 7 4 0 4 24]
[13 1 2 0 0 1 2 4 28]
[ 0 2 0 39 1 1 0 4 28]
[ 0 2 0 43 1 1 0 4 28]
[13 1 2 0 0 2 3 4 28]
[ 0 2 0 57 5 2 0 4 28]
[ 0 2 0 61 5 2 0 4 28]
[ 0 2 0 64 5 2 0 4 28]
[15 1 1 0 0 1 1 4 30]
[ 0 2 0 43 1 1 0 4 30]
[15 1 1 0 0 2 1 4 30]
[ 0 2 0 69 2 2 0 4 30]
[15 1 1 0 0 4 1 4 30]
[ 0 2 0 33 3 4 0 4 30]

What are those parameters in matching score calculation?

Hi, I'm studying your paper, and congratulations for the excellent work!

I got some detailed questions:

Specifically, what are those input parameters in calculating matching score (equation 16)?
Why do you need to add a "1()" function for the strength.
If the video's strength is coming from the visual beat, how do you handle the different value ranges of simu-note strength and the saliency?
Is the code corresponding to this part released, and if so, where is it?

Thank you!

How to control the genre of the generated music

I would like to consult that how can I control the genre of the generated music.
I notice that the paper mentions that one can control the instrument and genre by adding conditions before the input sequence. Also, in the train an inference code there are initial tokens.
However, the genre in initial tokens are always "5" which map to "pop". I try to modify the genre and instrument tokens during the training procedure while it doesn't work well.

what's the version of your python?

some question about 'encoder_hidden'

Hi, @wzk1015. Thank you for your great work.
When I read model.py, I found that when calculating y_type, encoder_hidden[:, 7:, :] is needed as input to the linear layer, so why not use the whole encoder_hidden as input instead of encoder_hidden[:, 7:, :]?
y_type = self.proj_type(encoder_hidden[:, 7:, :])

Bugs encountered while using the inference code "gen_midi_conditional.py" in "src/" folder

Hi, I encountered some bugs while using the "gen_midi_conditional.py" code to generate midi files for a given video. I installed the Python 2 environment given the requirement file "py2_requirements.txt" and then used the "video2npz.sh" to produce a "xxx.npz" file for the given video. But I encountered some problems while using the "gen_midi_conditional.py" code, the program output and error report are pasted below:

Command I used:
python3 gen_midi_conditional.py -f ../inference/LGpwmBqJF1Q_HarryPotter2ChamberOfSecrets.npz -c ../exp/train_exp/loss_70_params.pt

Code standard print:
inference
D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear
[18, 3, 18, 129, 18, 6, 27, 102, 5025]
[*] load model from: ../exp/train_exp/loss_70_params.pt
new song
[vlog_npz matrix print here]
------ initiate ------
tensor([[[17, 1, 10, 0, 0, 0, 0, 1, 0]]])

Error print:
Traceback (most recent call last):
File "gen_midi_conditional.py", line 104, in
generate()
File "gen_midi_conditional.py", line 85, in generate
res, err_note_number_list, err_beat_number_list = net(is_train=False, vlog=vlog_npz, C=0.7)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs, **kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in call_impl
return forward_call(*input, **kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 483, in forward
return self.inference_from_scratch(**kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 341, in inference_from_scratch
h, y_type = self.forward_hidden(input, is_training=False, init_token=pre_init)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 216, in forward_hidden
init_emb_linear = self.forward_init_token(init_token)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 162, in forward_init_token
emb_genre = self.init_emb_genre(x[..., 0])
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/utils.py", line 80, in forward
return self.lut(x) * math.sqrt(self.d_model)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/functional.py", line 2183, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

The inference code, trained model and data (including original video and processed .npz file) are attached in Google drive. Here is the link:
https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0?usp=sharing

Could you help me check this? Really appreciate it.

Best regards,

How to address TypeError: 'NoneType' object is not callable?

I use below to print the error message
except ImportError as e:
print(f"ImportError: {e}")
causal_dot_product_cuda = causal_dot_backward_cuda = None
it tells me ImportError: No module named 'fast_transformers.causal_product.causal_product_cuda'
So I think the issue is that Python is unable to load the causal_product_cuda.cu file in the causal_product directory under the fast_transformers package.
How can I resolve this?

TypeError: 'NoneType' object is not callable

Hello，according to the py3_requirements.txt, I set up the pytorch-1.9.1 environment. But when I tried to run train.py, it returned TypeError. The details are as follows. If you can give me some suggestions, I would be very grateful.
name:
args Namespace(batch_size=2, epochs=200, gpus=None, lr=0.0001, name='', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz')
num of encoder classes: [ 18 3 18 129 18 6 20 102 4865] [1 1 1]
D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear

: [ 18 3 18 129 18 6 20 102 4865]
DEVICE COUNT: 2
VISIBLE: 0,1
n_parameters: 39,005,620
train_data: dataset
batch_size: 2
num_batch: 1519
train_x: (3039, 9999, 9)
train_y: (3039, 9999, 9)
train_mask: (3039, 9999)
lr_init: 0.0001
DECAY_EPOCH: []
DECAY_RATIO: 0.1
Traceback (most recent call last):
File "train.py", line 219, in
train_dp()
File "train.py", line 162, in train_dp
losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/CODE/video-bgm-generation/src/model.py", line 482, in forward
return self.train_forward(**kwargs)
File "/home/bing/CODE/video-bgm-generation/src/model.py", line 450, in train_forward
h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token)
File "/home/bing/CODE/video-bgm-generation/src/model.py", line 221, in forward_hidden
encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 138, in forward
x = layer(x, attn_mask=attn_mask, length_mask=length_mask)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 81, in forward
key_lengths=length_mask
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward
key_lengths
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward
values
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear
V_new = causal_dot_product(Q, K, V)
File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/causal_product/init.py", line 48, in forward
product
TypeError: 'NoneType' object is not callable

Looking forward to your reply！

关于音轨

您好！感谢您对出色的工作的分享！我想请教一个问题，论文实现的音乐是否是单轨的，因为我看到论文中需要指定乐器标签

User defined features

Dear author, I've successfully run the colab, but is there any cell defines genre and instruments you mentioned in paper?
Or the colab just random selected it?

The role of 'processed_mid'

Hi,
The function midi_to_mp3 in src/midi2mp3.py includes a variable named 'processed_mid', which in my opinion converts the raw midi to the specified tempo in video2npz/metadate.json. However, the following fs.midi_to_audio function still picks the original midi_path instead of the processed_mid to generate the mp3 music, and I cannot find any usage of processed_mid in the following codes. Therefore, I wonder what is the role of the processed_mid, and if the variable in the fs.midi_to_audio should be processed_mid rather than midi_path.
Thanks

the use of fast_transformers?

Hi, there. I am having some problems when trying to reproduce your work.

At first, I have trouble in installing the fast-transformers package because of the c++ version, so I rewrite the transformer_encoder with torcch.nn.TransformerEncoder, but the loss is still above 2.0 after 200 epochs. So I wonder the usage of the fast_transformers transformer..

Later, I solved the problem of the c++ compiler version and installed that package, but the CUDA goes out of memory during the training. I tried to modified the code to Distributed Data Parallel version with zero ZeroRedundancyOptimizer, but the the memory is still not enough. (I trained the model on 4 RTX 2080Ti GPUs.

I wonder if you met the memory problem, and if yes, how you solved it?

Thanks.

Objective Evaluation code

Hello author，how to evaluate the effect of generated music？Is there any open source code implementation of the three Objective Evaluation method mentioned in the paper?

Pitch Class Histogram Entropy
Grooving Pattern Similarity
Structureness Indicator

wzk1015 / video-bgm-generation Goto Github PK

video-bgm-generation's Introduction

video-bgm-generation's People

Contributors

Stargazers

Watchers

Forkers

video-bgm-generation's Issues

Recommend Projects

Recommend Topics

Recommend Org