The midi-shark from jonah-chen

`constants.py` in Housekeeping Instructions

Will constants.py be used primarily for denoting where the dataset locations are and information about it?

If so, then we should probably hide it, as it is likely we have each put the dataset locations in different folders. It should not be pushed to Github.

If not, then we should probably create two files: one private and one public.

Inefficient use of disk space for `notes_generated`

I have noticed that the notes_generated folder contains .csv files of note information (the code that generated these files is in preprocessing/process_midi::save_midi). Is this intend to be directly read by a human? Please clarify why if it is. If it is not, it would be more efficient to store these in a binary format like .npy.

After this is fixed or clarified, I think I can execute the code to preprocess the full dataset, which should take between 2 and 4 hours.

Short Term Goals

(1) Execute pre-processing code (by Friday)
(2) Sub-problems (Update by Monday)
(2a) De-noising (Real spectrogram to generated spectrogram) <-- relatively easy
(2b) Generated Spectrogram to Note Graph

Integrate Transformer into Onsets Model

I'm having a hard time figuring out how to use the transformer in place of the LSTM in the Baseline Model.

midi-shark/model/model.py

Line 69 in 77a930e

    
           self.rnn = LSTM(model_size, model_size//2, batch_first=True, bidirectional=True)

Currently, both the input and output shape to the LSTM is torch.Size([8, 862, 768]).

If you need LiveShare to edit code on my system (having the data already processed), please @ me on discord.

process_midi.py::midi2labels AssertionError

I got an assertion error when executing this line

midi-shark/processing/process_midi.py

Line 30 in 332c7f9

assert(note_states[msg_note] is None)

. Weirdly enough I haven't caught this until now since ProcessPoolExecutor suppresses these errors?

Here are some of the Tracebacks:
Variables:

midi_file_path: ../2017/MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4.midi
cur_time: 433469.7916666725
cur_tempo: 500
state: NoteState(start=433340.6250000058, velocity=88)
msg: <message note_on channel=0 note=96 velocity=98 time=2>

Error message:

Exception has occurred: AssertionError       (note: full exception trace is shown but execution is paused at: preprocess_file)
exception: no description
  File ".../midi-shark/processing/process_midi.py", line 34, in midi2labels
    assert(note_states[msg_note] is None)
  File ".../midi-shark/processing/process_midi.py", line 62, in save_midi
    a = midi2labels(filename)
  File ".../midi-shark/processing/preprocess_batch.py", line 112, in preprocess_file (Current frame)
    save_midi(filename, NOTES_GENERATED_PATH, file)
  File ".../midi-shark/processing/preprocess_batch.py", line 150, in <module>
    executor.submit(preprocess_file,

I fixed many other bugs before, but this one is beyond my ability as it seems like it is related to the specific structure on the midi files and how they are read. If you can help out on this, that'd be sweet.

Currently, this bug is preventing the code in #20 from being executed, as running there are missing generated spectrograms and notes.

Running the checker (after preprocessing completes)

midi-shark/processing/checker.py

Lines 1 to 62 in b357bbd

    
           from dotenv import load_dotenv 
        
           import os 
        
           # Access environmental variables 
        
           load_dotenv(verbose=True) 
        
           input_path = os.environ.get('pathname') 
        
           output_path = os.environ.get('dataname') 
        
           YEARS = [2004, 2006, 2008, 2009, 2011, 2013, 2014, 2015, 2017, 2018] 
        
           passed = True 
        
           for year in YEARS: 
        
               try: 
        
                   year = str(year) 
        
                   files = os.listdir(os.path.join(input_path, year)) 
        
                   NOTE_GRAPHS_PATH = os.path.join(output_path, 'note_graphs', year) + "/" 
        
                   NOTES_GENERATED_PATH = os.path.join( 
        
                       output_path, 'notes_generated', year) + "/" 
        
                   SPECTROGRAM_GENERATED_PATH = os.path.join( 
        
                       output_path, 'spectrograms_generated', year) + "/" 
        
                   SPECTROGRAM_REAL_PATH = os.path.join( 
        
                       output_path, 'spectrograms_real', year) + "/" 
        
                   GENERATED_AUDIO_PATH = os.path.join( 
        
                       output_path, 'generated_audio', year) + "/" 
        
                   assert(len(files)%2 == 0) 
        
                   print(f"Year:{year} Expected:{len(files) // 2}") 
        
                   print(f"generated_audio:{len(os.listdir(GENERATED_AUDIO_PATH))} "+\ 
        
                       f"note_graphs:{len(os.listdir(NOTE_GRAPHS_PATH))} " +\ 
        
                       f"notes_generated:{len(os.listdir((NOTES_GENERATED_PATH)))} "+\ 
        
                       f"spectrograms_generated:{len(os.listdir((SPECTROGRAM_GENERATED_PATH)))} "+\ 
        
                       f"spectrograms_real:{len(os.listdir((SPECTROGRAM_REAL_PATH)))} " 
        
                   ) 
        
                   passed = passed and len(files)//2                               ==\ 
        
                                       len(os.listdir(GENERATED_AUDIO_PATH))       ==\ 
        
                                       len(os.listdir(NOTE_GRAPHS_PATH))           ==\ 
        
                                       len(os.listdir(NOTES_GENERATED_PATH))       ==\ 
        
                                       len(os.listdir(SPECTROGRAM_GENERATED_PATH)) ==\ 
        
                                       len(os.listdir(SPECTROGRAM_REAL_PATH)) 
        
                   # check if spectrogram real and generated match 
        
                   print("--------------------------") 
        
                   print("Checking generated spectrograms...") 
        
                   for spec in os.listdir(SPECTROGRAM_REAL_PATH): 
        
                       if spec not in os.listdir(SPECTROGRAM_GENERATED_PATH): 
        
                           print(f"Spectrogram {spec} not found in generated spectrograms") 
        
                           passed = False 
        
                   # check if spectrogram real and note graph match 
        
                   print("--------------------------") 
        
                   print("Checking note graphs...") 
        
                   for spec in os.listdir(SPECTROGRAM_REAL_PATH): 
        
                       if spec not in os.listdir(NOTE_GRAPHS_PATH): 
        
                           print(f"Spectrogram {spec} not found in note graphs") 
        
                           passed = False 
        
               except FileNotFoundError: 
        
                   print(f"is missing") 
        
                   passed = False 
        
           print("--------------------------") 
        
           print("Passed!" if passed else "Failed!")

results in the following output.

Year:2004 Expected:132
generated_audio:132 note_graphs:132 notes_generated:132 spectrograms_generated:132 spectrograms_real:132 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2006 Expected:115
generated_audio:115 note_graphs:115 notes_generated:115 spectrograms_generated:115 spectrograms_real:115 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2008 Expected:147
generated_audio:147 note_graphs:147 notes_generated:147 spectrograms_generated:147 spectrograms_real:147 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2009 Expected:125
generated_audio:124 note_graphs:124 notes_generated:124 spectrograms_generated:124 spectrograms_real:125 
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in note graphs
Year:2011 Expected:163
generated_audio:163 note_graphs:163 notes_generated:163 spectrograms_generated:163 spectrograms_real:163 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2013 Expected:127
generated_audio:127 note_graphs:127 notes_generated:127 spectrograms_generated:127 spectrograms_real:127 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2014 Expected:105
generated_audio:105 note_graphs:105 notes_generated:105 spectrograms_generated:105 spectrograms_real:105 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2015 Expected:129
generated_audio:129 note_graphs:129 notes_generated:129 spectrograms_generated:129 spectrograms_real:129 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2017 Expected:140
generated_audio:136 note_graphs:136 notes_generated:136 spectrograms_generated:136 spectrograms_real:140 
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in note graphs
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in note graphs
Year:2018 Expected:93
generated_audio:93 note_graphs:93 notes_generated:93 spectrograms_generated:93 spectrograms_real:93 
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
--------------------------
Failed!

Training loop is way too inefficient, unable to process a batch size greater than 1.

Here is the code

from deeplab import DeepLabv3Encoder, ImageDecoder
import torch
from torch.nn import CrossEntropyLoss
from torch.optim import SGD
from torchvision import transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import os
from random import randint
import numpy as np

_ROOT = "/media/hina/LinuxStorage/Datasets/"
_DATASETS = {
    'cityscapes': 'cityscapes'
}

if torch.cuda.is_available():  # Use GPU if and only if available
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    torch.set_default_dtype(torch.float32)


class Cityscapes(datasets.Cityscapes):
    t = transforms.Compose([
        transforms.RandomRotation(15),
        transforms.RandomResizedCrop((512,512), scale=(0.9,1.0))       
    ])

    def __getitem__(self, i):
        inp, target = super(Cityscapes, self).__getitem__(i)
        seed = randint(0, 0xffffffffffffffff)
        
        torch.manual_seed(seed)
        inp = Cityscapes.t(inp)
        torch.manual_seed(seed)
        target = Cityscapes.t(target)

        return inp, torch.Tensor(np.asarray(target))


data = Cityscapes(os.path.join(_ROOT, _DATASETS['cityscapes']), 
                     target_type='semantic', 
                     transform=transforms.ToTensor())
loader = DataLoader(data, batch_size=2, shuffle=True)

EPOCHS = 5
BATCH_SIZE = 4

encoder = DeepLabv3Encoder()
decoder = ImageDecoder(29)

encoder.cuda()
decoder.cuda()

criterion = CrossEntropyLoss()
optimizer = SGD(encoder.parameters(), 
                lr=0.045, 
                momentum=0.9)

for _ in range(EPOCHS):
    for _, (img, label) in enumerate(loader, 0):
        img = img.cuda()
        out = decoder(encoder(img))

This runs out of memory. The model has around 50M parameters so taking up over 12GB of memory is not expected. Not sure what's wrong. Please help.

Develop Model Prototype

Tips:

If training/testing/debugging on the GPU, use the line

torch.set_default_tensor_type('torch.cuda.FloatTensor')

to set create tensors on the GPU by default. DO NOT USE .cuda() function unless there is a very good reason. If strictly necessary, justify with comments.

My Proposed Approach

If you have any suggestions, please let us know in a comment.

In my opinion, we should start with developing the most theoretical parts of the model because this would help solidify understanding of the various methods we are implementing. After we complete this, we can integrate everything together into the model. I have broken it down into several components.

Transformer (Decoder for onset and frame heads):

Single components:

Positional Encoding
Generating Query/Key/Value (s)
Simple Attention
Multi-head Attention
Add & Norm

More complex networks:

Input and output embedding
N layer encoder structure
M layer decoder structure
Prediction head(s)

Conv Stack/Image Segmentation (Encoder for image segmentation model)

Conv Stack in 2016 paper
Adapted Deeplab v3 (I have implemented vanilla deeplabv3 in tensorflow before, but we need to adapt the ideas from the paper to suit our needs, also write it in pytorch)
Try other segmentation models

Full Model

Answer how the data will flow throughout these components (and simpler components like sigmoids, fully-connected layers, softmax, etc).
Experiment with velocity and other useful information we can extract??

P.S. I never used github properly before so I hope this is what I'm supposed to do.

Add padding to end of the clips.

Songs are divided into 20s intervals. The last clip of a song can be below 20s. There are two ways to deal with it.

Ignore it (currently doing, but may cause issues with the model being "bad" for ending clips)
Add Padding.

I can edit the code to add padding, what do you think? Is this important enough?

Requesting Loaders and Parsers

After the data is processed, the file tree for the data directory is

.
├── generated_audio
│   ├── ...
├── notes_generated
│   ├── ...
├── notes_graphs
│   ├── ...
├── spectrograms_real
│   ├── ...
├── spectrograms_generated
│   ├── ...

However, the only thing we can do with the preprocessed data right now is listen to the generated audio. I think we need to implement the following features to start training models.

Features Requested:

There should be a means to conveniently use the data for training (probably to pytorch dataloaders), preferably with one object instantiation or function call.
There should also be a means to visualize the data (when possible) in a human-readable form. I think @QiLinXue has jupyter notebooks that already does some of this. If you can implement them in a more organized and flexible, a lot of the work should already be done.

Requirements

The features should be implemented in a way that does not require the user to specify anything else assuming the file structure above and correct .env file.
The visualization features should be flexible and seamless.
Maximize the GPU utilization (measured in watts) allowed by the data loader during training. Ideally, full utilization (400W) can be achieved.
Minimize the number and size of files (measured in megabytes) written to the disk. Ideally, no temporary files are required to be stored on the disk.
Must not use more than about 50 000MB of host (CPU) memory.

Re-constructing the transformer without feeding in the ground truth

I'm trying to change the implementation of the transformer model so as not to receive the ground truth as an input.
https://github.com/jonah-chen/midi-shark/blob/transformer-pe/model/transformer.py#L177
In the decoder part, current implementation receives the ground truth (tgt) and returns the appropriate shape, but how should I return the correct shape without knowing the shape of the output?

	from dotenv import load_dotenv
	import os

	# Access environmental variables
	load_dotenv(verbose=True)

	input_path = os.environ.get('pathname')
	output_path = os.environ.get('dataname')
	YEARS = [2004, 2006, 2008, 2009, 2011, 2013, 2014, 2015, 2017, 2018]
	passed = True

	for year in YEARS:
	try:
	year = str(year)
	files = os.listdir(os.path.join(input_path, year))
	NOTE_GRAPHS_PATH = os.path.join(output_path, 'note_graphs', year) + "/"
	NOTES_GENERATED_PATH = os.path.join(
	output_path, 'notes_generated', year) + "/"
	SPECTROGRAM_GENERATED_PATH = os.path.join(
	output_path, 'spectrograms_generated', year) + "/"
	SPECTROGRAM_REAL_PATH = os.path.join(
	output_path, 'spectrograms_real', year) + "/"
	GENERATED_AUDIO_PATH = os.path.join(
	output_path, 'generated_audio', year) + "/"
	assert(len(files)%2 == 0)
	print(f"Year:{year} Expected:{len(files) // 2}")
	print(f"generated_audio:{len(os.listdir(GENERATED_AUDIO_PATH))} "+\
	f"note_graphs:{len(os.listdir(NOTE_GRAPHS_PATH))} " +\
	f"notes_generated:{len(os.listdir((NOTES_GENERATED_PATH)))} "+\
	f"spectrograms_generated:{len(os.listdir((SPECTROGRAM_GENERATED_PATH)))} "+\
	f"spectrograms_real:{len(os.listdir((SPECTROGRAM_REAL_PATH)))} "
	)

	passed = passed and len(files)//2 ==\
	len(os.listdir(GENERATED_AUDIO_PATH)) ==\
	len(os.listdir(NOTE_GRAPHS_PATH)) ==\
	len(os.listdir(NOTES_GENERATED_PATH)) ==\
	len(os.listdir(SPECTROGRAM_GENERATED_PATH)) ==\
	len(os.listdir(SPECTROGRAM_REAL_PATH))

	# check if spectrogram real and generated match
	print("--------------------------")
	print("Checking generated spectrograms...")
	for spec in os.listdir(SPECTROGRAM_REAL_PATH):
	if spec not in os.listdir(SPECTROGRAM_GENERATED_PATH):
	print(f"Spectrogram {spec} not found in generated spectrograms")
	passed = False

	# check if spectrogram real and note graph match
	print("--------------------------")
	print("Checking note graphs...")
	for spec in os.listdir(SPECTROGRAM_REAL_PATH):
	if spec not in os.listdir(NOTE_GRAPHS_PATH):
	print(f"Spectrogram {spec} not found in note graphs")
	passed = False

	except FileNotFoundError:
	print(f"is missing")
	passed = False

	print("--------------------------")
	print("Passed!" if passed else "Failed!")

jonah-chen / midi-shark Goto Github PK

midi-shark's Introduction

midi-shark's People

Contributors

Stargazers

Watchers

midi-shark's Issues