jonah-chen / midi-shark Goto Github PK
View Code? Open in Web Editor NEWAutomatic piano transcription model based on transformers and the onsets/frames architecture. Class project for APS360: Applied Fundamentals of Machine Learning.
Automatic piano transcription model based on transformers and the onsets/frames architecture. Class project for APS360: Applied Fundamentals of Machine Learning.
Will constants.py
be used primarily for denoting where the dataset locations are and information about it?
If so, then we should probably hide it, as it is likely we have each put the dataset locations in different folders. It should not be pushed to Github.
If not, then we should probably create two files: one private and one public.
I have noticed that the notes_generated
folder contains .csv
files of note information (the code that generated these files is in preprocessing/process_midi::save_midi
). Is this intend to be directly read by a human? Please clarify why if it is. If it is not, it would be more efficient to store these in a binary format like .npy
.
After this is fixed or clarified, I think I can execute the code to preprocess the full dataset, which should take between 2 and 4 hours.
(1) Execute pre-processing code (by Friday)
(2) Sub-problems (Update by Monday)
(2a) De-noising (Real spectrogram to generated spectrogram) <-- relatively easy
(2b) Generated Spectrogram to Note Graph
I'm having a hard time figuring out how to use the transformer in place of the LSTM in the Baseline Model.
Line 69 in 77a930e
torch.Size([8, 862, 768])
.
If you need LiveShare to edit code on my system (having the data already processed), please @ me on discord.
I got an assertion error when executing this line
midi-shark/processing/process_midi.py
Line 30 in 332c7f9
ProcessPoolExecutor
suppresses these errors?
Here are some of the Tracebacks:
Variables:
../2017/MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4.midi
433469.7916666725
500
NoteState(start=433340.6250000058, velocity=88)
<message note_on channel=0 note=96 velocity=98 time=2>
Error message:
Exception has occurred: AssertionError (note: full exception trace is shown but execution is paused at: preprocess_file)
exception: no description
File ".../midi-shark/processing/process_midi.py", line 34, in midi2labels
assert(note_states[msg_note] is None)
File ".../midi-shark/processing/process_midi.py", line 62, in save_midi
a = midi2labels(filename)
File ".../midi-shark/processing/preprocess_batch.py", line 112, in preprocess_file (Current frame)
save_midi(filename, NOTES_GENERATED_PATH, file)
File ".../midi-shark/processing/preprocess_batch.py", line 150, in <module>
executor.submit(preprocess_file,
I fixed many other bugs before, but this one is beyond my ability as it seems like it is related to the specific structure on the midi
files and how they are read. If you can help out on this, that'd be sweet.
Running the checker (after preprocessing completes)
midi-shark/processing/checker.py
Lines 1 to 62 in b357bbd
Year:2004 Expected:132
generated_audio:132 note_graphs:132 notes_generated:132 spectrograms_generated:132 spectrograms_real:132
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2006 Expected:115
generated_audio:115 note_graphs:115 notes_generated:115 spectrograms_generated:115 spectrograms_real:115
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2008 Expected:147
generated_audio:147 note_graphs:147 notes_generated:147 spectrograms_generated:147 spectrograms_real:147
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2009 Expected:125
generated_audio:124 note_graphs:124 notes_generated:124 spectrograms_generated:124 spectrograms_real:125
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_03_R1_2009_03-08_ORIG_MID--AUDIO_03_R1_2009_03_R1_2009_08_WAV not found in note graphs
Year:2011 Expected:163
generated_audio:163 note_graphs:163 notes_generated:163 spectrograms_generated:163 spectrograms_real:163
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2013 Expected:127
generated_audio:127 note_graphs:127 notes_generated:127 spectrograms_generated:127 spectrograms_real:127
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2014 Expected:105
generated_audio:105 note_graphs:105 notes_generated:105 spectrograms_generated:105 spectrograms_real:105
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2015 Expected:129
generated_audio:129 note_graphs:129 notes_generated:129 spectrograms_generated:129 spectrograms_real:129
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
Year:2017 Expected:140
generated_audio:136 note_graphs:136 notes_generated:136 spectrograms_generated:136 spectrograms_real:140
--------------------------
Checking generated spectrograms...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in generated spectrograms
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in generated spectrograms
--------------------------
Checking note graphs...
Spectrogram MIDI-Unprocessed_043_PIANO043_MID--AUDIO-split_07-06-17_Piano-e_1-03_wav--4 not found in note graphs
Spectrogram MIDI-Unprocessed_050_PIANO050_MID--AUDIO-split_07-06-17_Piano-e_3-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_041_PIANO041_MID--AUDIO-split_07-06-17_Piano-e_1-01_wav--3 not found in note graphs
Spectrogram MIDI-Unprocessed_081_PIANO081_MID--AUDIO-split_07-09-17_Piano-e_2_-02_wav--4 not found in note graphs
Year:2018 Expected:93
generated_audio:93 note_graphs:93 notes_generated:93 spectrograms_generated:93 spectrograms_real:93
--------------------------
Checking generated spectrograms...
--------------------------
Checking note graphs...
--------------------------
Failed!
Here is the code
from deeplab import DeepLabv3Encoder, ImageDecoder
import torch
from torch.nn import CrossEntropyLoss
from torch.optim import SGD
from torchvision import transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import os
from random import randint
import numpy as np
_ROOT = "/media/hina/LinuxStorage/Datasets/"
_DATASETS = {
'cityscapes': 'cityscapes'
}
if torch.cuda.is_available(): # Use GPU if and only if available
torch.set_default_tensor_type('torch.cuda.FloatTensor')
torch.set_default_dtype(torch.float32)
class Cityscapes(datasets.Cityscapes):
t = transforms.Compose([
transforms.RandomRotation(15),
transforms.RandomResizedCrop((512,512), scale=(0.9,1.0))
])
def __getitem__(self, i):
inp, target = super(Cityscapes, self).__getitem__(i)
seed = randint(0, 0xffffffffffffffff)
torch.manual_seed(seed)
inp = Cityscapes.t(inp)
torch.manual_seed(seed)
target = Cityscapes.t(target)
return inp, torch.Tensor(np.asarray(target))
data = Cityscapes(os.path.join(_ROOT, _DATASETS['cityscapes']),
target_type='semantic',
transform=transforms.ToTensor())
loader = DataLoader(data, batch_size=2, shuffle=True)
EPOCHS = 5
BATCH_SIZE = 4
encoder = DeepLabv3Encoder()
decoder = ImageDecoder(29)
encoder.cuda()
decoder.cuda()
criterion = CrossEntropyLoss()
optimizer = SGD(encoder.parameters(),
lr=0.045,
momentum=0.9)
for _ in range(EPOCHS):
for _, (img, label) in enumerate(loader, 0):
img = img.cuda()
out = decoder(encoder(img))
This runs out of memory. The model has around 50M parameters so taking up over 12GB of memory is not expected. Not sure what's wrong. Please help.
torch.set_default_tensor_type('torch.cuda.FloatTensor')
to set create tensors on the GPU by default. DO NOT USE .cuda()
function unless there is a very good reason. If strictly necessary, justify with comments.
If you have any suggestions, please let us know in a comment.
In my opinion, we should start with developing the most theoretical parts of the model because this would help solidify understanding of the various methods we are implementing. After we complete this, we can integrate everything together into the model. I have broken it down into several components.
Single components:
More complex networks:
P.S. I never used github properly before so I hope this is what I'm supposed to do.
Songs are divided into 20s intervals. The last clip of a song can be below 20s. There are two ways to deal with it.
I can edit the code to add padding, what do you think? Is this important enough?
After the data is processed, the file tree for the data
directory is
.
├── generated_audio
│ ├── ...
├── notes_generated
│ ├── ...
├── notes_graphs
│ ├── ...
├── spectrograms_real
│ ├── ...
├── spectrograms_generated
│ ├── ...
However, the only thing we can do with the preprocessed data right now is listen to the generated audio. I think we need to implement the following features to start training models.
.env
file.I'm trying to change the implementation of the transformer model so as not to receive the ground truth as an input.
https://github.com/jonah-chen/midi-shark/blob/transformer-pe/model/transformer.py#L177
In the decoder part, current implementation receives the ground truth (tgt
) and returns the appropriate shape, but how should I return the correct shape without knowing the shape of the output?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.