Code Monkey home page Code Monkey logo

midi_degradation_toolkit's People

Contributors

apmcleod avatar jamesowers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

midi_degradation_toolkit's Issues

Convert degradations to integers

In particular, split_range_sample was originally written to sample uniformly-distributed floats. We have since decided that rounding to ints is better.

I have converted the split_range_sample method itself, but not yet all of the degradations. I will do this as I go through writing tests for them.

So far, only time_shift has been updated to reflect this.

Fix the pip install

Hopefully this wont be too hard. Essentially I'd like the default setting to install bare minumum, and an -all type flag to install full optional dependencies. Don't know the standard for that but need to sort it for ease.

Decide how to ignore tracks in degradations

We will ignore them in our model input and output, but how should we handle them in the degradations?

For example, can join_notes join across tracks (this would be "ignoring")?

Baseline model improvements and comparisons from literature

This is just a placeholder to think about about improvements to make.

Also, I performed a lit review recently on a paper of a similar format to ours: here's some new data and new tasks. My main criticism was that there were no comparisons to models from literature. Are we sure there is nothing we can implement from literature? We should anticipate this criticism, and think about what models from literature we could try and implement for comparison.

df_to_csv and csv_to_df should probably be in the same place

I'd go for having them both in that "midi" package and renaming it perhaps? (Currently csv_to_df is data_structures.read_note_csv)

Essentially, mdtk.midi (renamed) would be for file I/O and conversion, while mdtk.data_structures would be about doing things with dataframes.

Fix overlaps on input

Related to #20 (and number #46, in a way)

As discussed in Skype, here are a few examples of how we want it to work (o=onset; .=nothing, -=sustain):

Example 1: Don't cut on offsets.
Input:

....o--------
....o----....

Output:

....o--------

Example 2: Cut on onsets.
Input:

....o--------
......o------

Output:

....o-o------

The overlap check in read_note_csv is dog slow...

The line df = df.groupby(['track', 'pitch']).apply(fix_overlapping_notes) increases computation time by at least 100x. This is mad. For now, I'm just going to bypass this by adding a flag to skip the check (we decided not to enforce this), but I think we will learn something if we try to profile the code and understand why it's so slow.

Degradation input format

Decide what to use.

The Composition object potentially provides lots of useful functionality for examining the excerpts, but also adds potential overhead and complication if we don't use any of that functionality.

A lower-level solution like directly using a dataframe, array, or dict, might be a better option for some use cases.

Enable join_notes to join more than 2 notes together, if possible.

Add a parameter, max_notes, to do so.

There are 2 options:

  1. Join greedily as many notes as possible up to max_notes.
  2. Choose a random number of notes up to max_notes and join that many.

This will allow join_notes to always reverse split_notes. Currently, split_notes is the only one that cannot be reverse with another (or the same) degradation.

New degradation parameters

  • Similar to distribution parameter for pitch shifting, we might want a parameter specifying to how often to lengthen or shorten notes in the onset_shift and offset_shift methods.
  • We may want the ability to split a note into notes of non-equal duration.

Speed up overlaps check

Not a priority before release 1, but (as noted in #60), data_structures.fix_overlaps is quite a slow function at the moment.

Investigate use of `inplace` operations for efficiency in degradations

Whilst it's not possible to operate inplace when adding new data to dataframes (a copy must occur), there are various degradations that could work inplace - this could be much more efficient if the user doesn't want to retain the original dataframe.

Investigate speedups obtained and whether this is worth it.

Degradation input validation is not explicitly performed.

For example, min_duration > max_duration.

Currently, examples like the above will warn with "No valid notes found." (or similar), and return None. Because it can't find a note to shift which will result in a duration in that given range. That's probably fine, but we could also explicitly check parameter settings to give a more explicit warning as to what is happening.

Quantize offset (note_off) times incorrectly

We currently quantize onset time and duration. It would be good to have an option (at least) to quantize instead onset and offset (note_on and note_off) directly. Rounding issues can cause duration to be off by 1 currently.

Improvements for training [low priority]

  • Allow continuing training from a checkpoint
  • Properly seed models such that they can be reproduced (this is actually quite difficult but I have code for it)
  • Get formatters out of Trainers
  • ...think about how eval should be done (in iteration()?)

Support evaluation for people not using our trainers

Possibilities include:

An eval function for each task which takes as input a data point (or a set of data points), and a label (or a set of labels), and returns the metric.

A --file flag to read labels from a file in the given eval script.

Ideally, these would be independent of our formats where possible. For example, the helpfulness script takes in a 3 dfs and outputs a score. Others should be similarly easy to use, independent of format.

Implement Evaluation

Enable the testing of our models, as well as evaluation metrics (F-measure, etc.) for the various tasks:

  1. F-measure
  2. Accuracy
  3. F-measure
  4. Helpfulness (based on note- and frame-based F-measures from mir_eval)

"Rule-based" baselines

Currently we don't know how good our baselines are against super dumb baselines:

  • Task 1: [1/9, 8/9] for everything
  • Task 2: [1/9, ..., 1/9] for everything
  • Task 3:
    • get average 'changed region' length and just pick the section in the middle
    • predict avg nr of 1s everywhere
  • Task 4: do nothing

We should probably evaluate these for ourselves at least before releasing our baselines (which will look a bit silly if they don't win!). Any other dumb ones to propose?

Iterate by index over pandas dataframes is not efficient - edit if possible

In degradations, there's a common pattern like:

for note_index in range(excerpt.note_df.shape[0]):
   ...

i.e. using a for loop over a list integers to edit a dataframe.

This is bad for two reasons:

  1. it's slow
  2. there's no guarantee the index will be a range - it probably will, but you never know. e.g. if the index is [1, 3, 2], .loc 1, 2, then 3 will return you ilocs 0, 2, then 1.

There's likely a better way to do this. If possible, use a vectorised solution. For example, with relation to time_shift(), which has this code:

    for note_index in range(excerpt.note_df.shape[0]):
        onset = excerpt.note_df.loc[note_index, 'onset']
        offset = onset + excerpt.note_df.loc[note_index, 'dur']
        
        # Early-shift bounds (decrease onset)
        earliest_earlier_onset = max(onset - max_shift + 1, 0)
        latest_earlier_onset = max(onset - min_shift + 1,
                                   earliest_earlier_onset)
        latest_earlier_onset = min(latest_earlier_onset, onset)
        ...

you could instead do this process in a vectorised fashion - you're just making a boolean array ultimately. Remove the loop entirely, and set onset = excerpt.note_df.onset, offset = onset + excerpt.note_df.dur, and use pandas series .apply() methods to apply max() and min() to every element.

Degradation error - just noting in case you know...

Running ./make_dataset.py, got this error:

Making target data:   0%|                                                                                                                        | 5/22522 [00:00<1:17:36,  4.84it/s]
Traceback (most recent call last):
  File "./make_dataset.py", line 457, in <module>
    degraded = deg_fun(excerpt, **deg_fun_kwargs)
  File "/Users/kungfujam/git/midi_degradation_toolkit/mdtk/degradations.py", line 37, in seeded_func
    return func(*args, **kwargs)
  File "/Users/kungfujam/git/midi_degradation_toolkit/mdtk/degradations.py", line 980, in join_notes
    degraded.loc[nexts[-1]]['dur'] -
IndexError: list index out of range

By the sounds of things, nexts is empty prematurely or something.

Generate docs

Generate/check docs for mdtk use (including the readme).

Before release: DQA of the released data!

We should do some data quality analysis of the data we are going to release. I'm thinking a notebook (also doubles as an intro to what data are available for use) which reviews the data by:

  • Playing a selection of degraded and clean excerpts
    • Any issues with data? Choppy? Did flattening tracks work well?
    • Are degradations obvious? Are there better parameters for degradations to use?
  • providing stats about number of notes in those excerpts, lengths of notes, and the actual amount of time these notes occur in etc.
    • This will inform the correct seq_len to use for models (may be worth excluding silly long excerpts)
  • giving some background as to where these data are from and, if possible, some summary stats about genre, or tempo, or whatever we can glean
  • Summarise performance broken doWn over datasets (info available in metadata)

Essentially I want to check that the data are not rubbish, and we can hear where the degradations are!

Have make_dataset not rely on torch

I guess it's ok if the package requires torch, but users that have issues installing pytorch for whatever reason would ideally still be able to run make_dataset.py. I don't think it is actually required.

Degraded excerpts may not start from time 0.

This would introduce all sorts of complications if we shift the degraded excerpts to 0, including:

  • Much more difficult to write tests (and would need to rewrite them)
  • Piano-rolls no longer align to create the binary frame-based task ground truth.

Still, it would be easy for a model to discover that any excerpt not beginning at 0 is degraded.

One possible solution would be to add some random amount of space (Say between 0 and 100 ms) to the start of each excerpt upon the creation of the excerpt. ie, here: https://github.com/JamesOwers/midi_degradation_toolkit/blob/master/make_dataset.py#L458

Fix the absolute diatribe of warnings and logs spewed out for make_dataset.py

Best thing to do would be to switch from warnings to logging library, but this could be a bit long. At minimum, make all the tqdm stuff shorter (long paths in desc are killing it and sometimes creating multi-line output) and probably suppress warnings by default, only switching them on with a flag in the script.

This supersedes #3

Default pitch ranges are incorrect.

I use [0-88) at the moment. It seems we use something more like [0-127) (or something). Anyways, we should set sensible defaults for some of these as global vars, since we may use similar params in different places.

Warnings seem to be showing additional lines e.g. ...

  /Users/kungfujam/git/midi_degradation_toolkit/mdtk/downloaders.py:76: UserWarning: WARNING: /Users/kungfujam/.mdtk_cache already exists, writing files within here only if they do not already exist.
    category=UserWarning)

this category=UserWarning) isn't part of the warning.

Enforce DataFrame formatting

Or, add tests in the degradations with other formats.

For example, there are currently no tests with dataframes without consecutive indices starting from 0, or unsorted dataframes, or ones with columns out of order.

Some of these we might want to allow, some we might be okay with errors.

We could also add a function which, given some data frame, enforces formatting, like:

  1. Drops all columns not pitch, onset, offset, track
  2. Rounds those columns to ints
  3. Puts those columns in the correct order
  4. Sorts the dataframe
  5. Reset the index in that order and drop the index column

Currently, the degradations are closed under 1, 2, 3, and 5 (if given correctly, will return correctly), but not necessarily 4 (the resulting df may not be sorted, even if the input one was).

Giving one a df which has the wrong columns would likely error.

Code to measure types of errors present in a transcription

For people hoping to use this for AMT, it would be useful to have some code which, given a transcription and a ground truth, will output the proportion of errors which could be assigned to each degradation.

This would allow users to create a dataset for their specific use case.

We could also have it output recommended parameter values.

Or, ideally, output a json file directly which would be readable by the make_dataset script.

Some difficulties:

  • It won't be possible to get the distribution exactly correct, as some errors might be ambiguous (is it a pitch shift and a time shift? Or an add note and a remove note?)
  • We may want to allow the user to specify a window size for this. Then, we can also measure how many windows have no degradation. (#31)
  • We may want to allow multiple degradations per excerpt in some cases, eventually. (#32)

Ensure Randomness and reproducibility

Some types of randomness aren't guaranteed to be OS independent.

Also, we may want to use a numpy RandomState rather than seeds, but probably not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.