Code Monkey home page Code Monkey logo

midi-shark's Introduction

midi-shark's People

Contributors

joehattori avatar jonah-chen avatar khanatatac avatar qilinxue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

midi-shark's Issues

`constants.py` in Housekeeping Instructions

Will constants.py be used primarily for denoting where the dataset locations are and information about it?

If so, then we should probably hide it, as it is likely we have each put the dataset locations in different folders. It should not be pushed to Github.

If not, then we should probably create two files: one private and one public.

Inefficient use of disk space for `notes_generated`

I have noticed that the notes_generated folder contains .csv files of note information (the code that generated these files is in preprocessing/process_midi::save_midi). Is this intend to be directly read by a human? Please clarify why if it is. If it is not, it would be more efficient to store these in a binary format like .npy.

After this is fixed or clarified, I think I can execute the code to preprocess the full dataset, which should take between 2 and 4 hours.

Short Term Goals

(1) Execute pre-processing code (by Friday)
(2) Sub-problems (Update by Monday)
(2a) De-noising (Real spectrogram to generated spectrogram) <-- relatively easy
(2b) Generated Spectrogram to Note Graph

Integrate Transformer into Onsets Model

I'm having a hard time figuring out how to use the transformer in place of the LSTM in the Baseline Model.

self.rnn = LSTM(model_size, model_size//2, batch_first=True, bidirectional=True)

Currently, both the input and output shape to the LSTM is torch.Size([8, 862, 768]).

If you need LiveShare to edit code on my system (having the data already processed), please @ me on discord.

process_midi.py::midi2labels AssertionError

I got an assertion error when executing this line

assert(note_states[msg_note] is None)
. Weirdly enough I haven't caught this until now since ProcessPoolExecutor suppresses these errors?

Here are some of the Tracebacks:
Variables:

  • midi_file_path: ../2017/MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4.midi
  • cur_time: 433469.7916666725
  • cur_tempo: 500
  • state: NoteState(start=433340.6250000058, velocity=88)
  • msg: <message note_on channel=0 note=96 velocity=98 time=2>

Error message:

Exception has occurred: AssertionError       (note: full exception trace is shown but execution is paused at: preprocess_file)
exception: no description
  File ".../midi-shark/processing/process_midi.py", line 34, in midi2labels
    assert(note_states[msg_note] is None)
  File ".../midi-shark/processing/process_midi.py", line 62, in save_midi
    a = midi2labels(filename)
  File ".../midi-shark/processing/preprocess_batch.py", line 112, in preprocess_file (Current frame)
    save_midi(filename, NOTES_GENERATED_PATH, file)
  File ".../midi-shark/processing/preprocess_batch.py", line 150, in <module>
    executor.submit(preprocess_file,

I fixed many other bugs before, but this one is beyond my ability as it seems like it is related to the specific structure on the midi files and how they are read. If you can help out on this, that'd be sweet.

Currently, this bug is preventing the code in #20 from being executed, as running there are missing generated spectrograms and notes.

Running the checker (after preprocessing completes)

from dotenv import load_dotenv
import os
# Access environmental variables
load_dotenv(verbose=True)
input_path = os.environ.get('pathname')
output_path = os.environ.get('dataname')
YEARS = [2004, 2006, 2008, 2009, 2011, 2013, 2014, 2015, 2017, 2018]
passed = True
for year in YEARS:
try:
year = str(year)
files = os.listdir(os.path.join(input_path, year))
NOTE_GRAPHS_PATH = os.path.join(output_path, 'note_graphs', year) + "/"
NOTES_GENERATED_PATH = os.path.join(
output_path, 'notes_generated', year) + "/"
SPECTROGRAM_GENERATED_PATH = os.path.join(
output_path, 'spectrograms_generated', year) + "/"
SPECTROGRAM_REAL_PATH = os.path.join(
output_path, 'spectrograms_real', year) + "/"
GENERATED_AUDIO_PATH = os.path.join(
output_path, 'generated_audio', year) + "/"
assert(len(files)%2 == 0)
print(f"Year:{year} Expected:{len(files) // 2}")
print(f"generated_audio:{len(os.listdir(GENERATED_AUDIO_PATH))} "+\
f"note_graphs:{len(os.listdir(NOTE_GRAPHS_PATH))} " +\
f"notes_generated:{len(os.listdir((NOTES_GENERATED_PATH)))} "+\
f"spectrograms_generated:{len(os.listdir((SPECTROGRAM_GENERATED_PATH)))} "+\
f"spectrograms_real:{len(os.listdir((SPECTROGRAM_REAL_PATH)))} "
)
passed = passed and len(files)//2 ==\
len(os.listdir(GENERATED_AUDIO_PATH)) ==\
len(os.listdir(NOTE_GRAPHS_PATH)) ==\
len(os.listdir(NOTES_GENERATED_PATH)) ==\
len(os.listdir(SPECTROGRAM_GENERATED_PATH)) ==\
len(os.listdir(SPECTROGRAM_REAL_PATH))
# check if spectrogram real and generated match
print("--------------------------")
print("Checking generated spectrograms...")
for spec in os.listdir(SPECTROGRAM_REAL_PATH):
if spec not in os.listdir(SPECTROGRAM_GENERATED_PATH):
print(f"Spectrogram {spec} not found in generated spectrograms")
passed = False
# check if spectrogram real and note graph match
print("--------------------------")
print("Checking note graphs...")
for spec in os.listdir(SPECTROGRAM_REAL_PATH):
if spec not in os.listdir(NOTE_GRAPHS_PATH):
print(f"Spectrogram {spec} not found in note graphs")
passed = False
except FileNotFoundError:
print(f"is missing")
passed = False
print("--------------------------")
print("Passed!" if passed else "Failed!")
results in the following output.

Year:2004 Expected:132
generated_audio:132 note_graphs:132 notes_generated:132 spectrograms_generated:132 spectrograms_real:132 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2006 Expected:115
generated_audio:115 note_graphs:115 notes_generated:115 spectrograms_generated:115 spectrograms_real:115 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2008 Expected:147
generated_audio:147 note_graphs:147 notes_generated:147 spectrograms_generated:147 spectrograms_real:147 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2009 Expected:125
generated_audio:124 note_graphs:124 notes_generated:124 spectrograms_generated:124 spectrograms_real:125 
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in note graphs
Year:2011 Expected:163
generated_audio:163 note_graphs:163 notes_generated:163 spectrograms_generated:163 spectrograms_real:163 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2013 Expected:127
generated_audio:127 note_graphs:127 notes_generated:127 spectrograms_generated:127 spectrograms_real:127 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2014 Expected:105
generated_audio:105 note_graphs:105 notes_generated:105 spectrograms_generated:105 spectrograms_real:105 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2015 Expected:129
generated_audio:129 note_graphs:129 notes_generated:129 spectrograms_generated:129 spectrograms_real:129 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2017 Expected:140
generated_audio:136 note_graphs:136 notes_generated:136 spectrograms_generated:136 spectrograms_real:140 
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in note graphs
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in note graphs
Year:2018 Expected:93
generated_audio:93 note_graphs:93 notes_generated:93 spectrograms_generated:93 spectrograms_real:93 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
--------------------------
Failed!

Training loop is way too inefficient, unable to process a batch size greater than 1.

Here is the code

from deeplab import DeepLabv3Encoder, ImageDecoder
import torch
from torch.nn import CrossEntropyLoss
from torch.optim import SGD
from torchvision import transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import os
from random import randint
import numpy as np

_ROOT = "/media/hina/LinuxStorage/Datasets/"
_DATASETS = {
    'cityscapes': 'cityscapes'
}

if torch.cuda.is_available():  # Use GPU if and only if available
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    torch.set_default_dtype(torch.float32)


class Cityscapes(datasets.Cityscapes):
    t = transforms.Compose([
        transforms.RandomRotation(15),
        transforms.RandomResizedCrop((512,512), scale=(0.9,1.0))       
    ])

    def __getitem__(self, i):
        inp, target = super(Cityscapes, self).__getitem__(i)
        seed = randint(0, 0xffffffffffffffff)
        
        torch.manual_seed(seed)
        inp = Cityscapes.t(inp)
        torch.manual_seed(seed)
        target = Cityscapes.t(target)

        return inp, torch.Tensor(np.asarray(target))


data = Cityscapes(os.path.join(_ROOT, _DATASETS['cityscapes']), 
                     target_type='semantic', 
                     transform=transforms.ToTensor())
loader = DataLoader(data, batch_size=2, shuffle=True)

EPOCHS = 5
BATCH_SIZE = 4

encoder = DeepLabv3Encoder()
decoder = ImageDecoder(29)

encoder.cuda()
decoder.cuda()

criterion = CrossEntropyLoss()
optimizer = SGD(encoder.parameters(), 
                lr=0.045, 
                momentum=0.9)

for _ in range(EPOCHS):
    for _, (img, label) in enumerate(loader, 0):
        img = img.cuda()
        out = decoder(encoder(img))        

This runs out of memory. The model has around 50M parameters so taking up over 12GB of memory is not expected. Not sure what's wrong. Please help.

Develop Model Prototype

Tips:

  • If training/testing/debugging on the GPU, use the line
torch.set_default_tensor_type('torch.cuda.FloatTensor')

to set create tensors on the GPU by default. DO NOT USE .cuda() function unless there is a very good reason. If strictly necessary, justify with comments.

My Proposed Approach

If you have any suggestions, please let us know in a comment.

In my opinion, we should start with developing the most theoretical parts of the model because this would help solidify understanding of the various methods we are implementing. After we complete this, we can integrate everything together into the model. I have broken it down into several components.

Transformer (Decoder for onset and frame heads):

Single components:

  • Positional Encoding
  • Generating Query/Key/Value (s)
  • Simple Attention
  • Multi-head Attention
  • Add & Norm

More complex networks:

  • Input and output embedding
  • N layer encoder structure
  • M layer decoder structure
  • Prediction head(s)

Conv Stack/Image Segmentation (Encoder for image segmentation model)

  • Conv Stack in 2016 paper
  • Adapted Deeplab v3 (I have implemented vanilla deeplabv3 in tensorflow before, but we need to adapt the ideas from the paper to suit our needs, also write it in pytorch)
  • Try other segmentation models

Full Model

  • Answer how the data will flow throughout these components (and simpler components like sigmoids, fully-connected layers, softmax, etc).
  • Experiment with velocity and other useful information we can extract??

P.S. I never used github properly before so I hope this is what I'm supposed to do.

Add padding to end of the clips.

Songs are divided into 20s intervals. The last clip of a song can be below 20s. There are two ways to deal with it.

  • Ignore it (currently doing, but may cause issues with the model being "bad" for ending clips)
  • Add Padding.

I can edit the code to add padding, what do you think? Is this important enough?

Requesting Loaders and Parsers

After the data is processed, the file tree for the data directory is

.
├── generated_audio
│   ├── ...
├── notes_generated
│   ├── ...
├── notes_graphs
│   ├── ...
├── spectrograms_real
│   ├── ...
├── spectrograms_generated
│   ├── ...

However, the only thing we can do with the preprocessed data right now is listen to the generated audio. I think we need to implement the following features to start training models.

Features Requested:

  1. There should be a means to conveniently use the data for training (probably to pytorch dataloaders), preferably with one object instantiation or function call.
  2. There should also be a means to visualize the data (when possible) in a human-readable form. I think @QiLinXue has jupyter notebooks that already does some of this. If you can implement them in a more organized and flexible, a lot of the work should already be done.

Requirements

  • The features should be implemented in a way that does not require the user to specify anything else assuming the file structure above and correct .env file.
  • The visualization features should be flexible and seamless.
  • Maximize the GPU utilization (measured in watts) allowed by the data loader during training. Ideally, full utilization (400W) can be achieved.
  • Minimize the number and size of files (measured in megabytes) written to the disk. Ideally, no temporary files are required to be stored on the disk.
  • Must not use more than about 50 000MB of host (CPU) memory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.