Code Monkey home page Code Monkey logo

when-in-rome's Introduction

GitHub top language GitHub issues GitHub last commit GitHub repo size License

When in Rome

'When in Rome' brings together the world's functional harmonic analyses in encoded formats into a single, consistent repository. This enables musicians and developers to interact with that great body of work at scale, with minimal overheads.

In total, there are now approximately 2,000 analyses of 1,500 distinct works.

Additionally, 'When in Rome' provides code for working with these corpora, building on the music21 library for music analysis.

Is it for me?

This is best thought of as primarily a corpus of analyses which secondarily provide code for working with them and include the score where possible. I.e., the focus is on the analyses. There is a very great deal we can do with those analyses alone. Clearly there are also certain questions to require analysis-source alignment. We do our best to cater for that by including the score wherever possible, and as reliably aligned as possible (as anyone in the field knows, this is a significant challenge).

Maybe yes ...

'When in Rome' data is also used in external research projects and apps including the:

Are you using 'When in Rome' in a public-facing project? Let us know!

Maybe no ...

We're proud of how useful this is. All the same, it might not serve your needs. Might we suggest that if you're looking for:

  • scores only in a permissive licence, then try the OpenScore Lieder Corpus (1,300 songs, CC0 licence).
  • a small corpus of perfectly aligned scores + analyses (i.e., your priority is alignment, not overall size or diversity of content) then a single (not meta-) corpus like one of the DCML corpora listed below.

Corpus Directory Structure

Overall

<genre>/<composer>/<set>/<movement>/<files>
  • <genre>: A top level classification of the works by approximate genre or repertoire. As most corpora are prepared in relation to this categorisation, this top level division also reflects something of the corpora's origins. (For the avoidance of doubt, every analysis includes an attribution.)

  • <composer>: composer's name in the form Last,_First.

  • <set>: extended work (e.g. a song cycle or piano sonata) where applicable. Stand-alone scores are placed in a set called _ (i.e. a single underscore) for the sake of consistency.

  • <movement>: name and/or number of the movement. In the case of a piano sonata, folder names are generally number-only: e.g. 1. Most songs include both the name of the song and its position in the set (e.g. 1_Nach_Süden)

  • <files>: See the following sub-sections.

The Key Modulations and Tonicizations corpus is a slight exception: we preserve the organisation of that corpus by author, title, example number, e.g., Corpus/Textbooks/Aldwell,_Edward/Harmony_and_Voice_Leading/2a/. So the <genre> is Textbooks, the <composer> is the author, the <set> is the title, and the <movement> is the example number. We find this more logical that re-organisation by composer.

All folders include:

  • score.mxl or a remote.json file including links to external score files

    • What: score.mxl is a copy of the score in the compressed musicXML format. This is provided for all new scores, as well as all originating elsewhere
      where that original is in a format which music21 cannot parse.
    • How to use: Open in any software for music notation (e.g., MuseScore).
    • Where there is no local score.mxl, there is a remote.json instead. Please note:
      • This file points to an externally hosted score in a format which music21 can parse.
      • This is designed to prevent duplication and automatically include source updates.
      • Note that MuseScore files are included in a local conversion (.mxl) rather than remote.
        • This is because music21 cannot parse them and conversion requires the mscore package (see Code.updates_and_checks.convert_musescore_score_corpus).
      • For downloading a local copy of remote files, see Code.updates_and_checks.remote_scores and the argument convert_and_write_local. Read those docs for details and warnings.
      • Please check and observe the licence of all scores, especially those hosted externally.
  • analysis.txt

    • What: A human analysis in plain text.
    • How to use: Open in any text editor. You can also use these analyses as a kind of template for your own, by creating a copy and editing only the moments you disagree with.
  • analysis_automatic.rntxt.

    • What: An automatic analysis made by AugmentedNet - a machine learning architecture which, in turn, is built on this meta-corpus' data.
    • How to use: In exactly the same way as a human analysis, e.g., as a template (same format, same parsing routines).

Some folders include:

  • remote.json files

    • What: this provides additional information about remote content including paths to external scores as discussed above.
    • Additionally, we take the opportunity to provide metadata including composer name and one or more sets of catalogue information (Opus and/or equivalent).
  • analysis_<analyst>.txt

    • What: An alternative analysis. This takes one of two forms:
      • A copy of an original analysis exactly as converted for cases where significant changes have been made to that analysis. See, for example, this edit of this "original"
      • A second analysis of the same work. The 'TAVERN' dataset includes pairs of analyses of the same work. In order to ensure there is exactly one analysis.txt throughout, we name the pair analysis.txt (note not analysis_A.txt) and analysis_B.txt.
      • We likewise organise cases of two separate corpora of analyses of the same music this way.
        • The set which is complete takes precedence for the analysis.txt name.
    • How to use: All such text files can be opened in the normal way. "Original conversions" serve as a point of reference for full disclosure on the conversion process.

Optional extra files (not included but easy to generate):

This repo. includes code and clear instructions for creating any or all of the following additional files for the whole meta-corpus, or for a specific sub-corpus.

The example folder contains all of these files for one example score: Clara Schumann's Lieder, Op.12, No.4, 'Liebst du um Schönheit'. Most of the variants derive from the options for pitch class profile generations, creating files in the form: profiles_<and_features_>by_<segmentation_type>.<format>

  • <and_features_> (optional) includes harmonic feature information. See notes at Code/Pitch_profiles/chord_features.py
  • <segmentation_type> options group by moments of change to the chord, key, or measure.
  • <format> options are .arff, .csv, .json, and .tsv.

Apart from these, the example folder also contains the files which are included in all folders by default (see above) as well as others that can likewise be generated across the meta-corpus:

  • analysis_on_score.mxl: the analysis rendered in musical notation alongside the score (as an additional 'part').
  • feedback_on_analysis.txt: automatically generated feedback on any analysis complete with an overall rating. Useful for proofreading. See Code/romanUmpire.py for more details on what it can and can't do.
  • <Keys_or_chords>_and_distributions.tsv: pitch class distributions for each range delimited by a single key or chord. See notes at Code/Pitch_profiles/get_distributions.py
  • slices.tsv and/or slices_with_analysis.tsv: a tabular representation of the score in 'slices' - vertical cross-sections of the score, with one entry for each change of pitch. This is useful for various tasks, both human (at-a-glance checks) and automatic (much quicker to load and process than parsing musicXML). The columns from left to right set out the:
    • Offset from the start (a time stamp measured in terms of quarter notes),
    • Measure number,
    • Beat,
    • Beat 'Strength' (from relative metrical position),
    • Length (also measured in quarter notes),
    • Pitches,
    • and where the analysis is included, also Key, Chord
  • template.txt: a proto-analysis text file with only the metadata, time signatures, measures, and measure equality ranges as a template - i.e. all the information you need from the score with space to enter your own analysis from scratch.

This is clearly too much to include for every entry. Use the example folder to see the options and 'try before you' commit to a corpus-wide generation.

Corpus Overview

This corpus involves the combination of new analyses with conversions of those originating elsewhere.

Corpora originating elsewhere

Converted from other formats:

Analyses originally in the 'RomanText' format (no conversion needed), analysed by Dmitri Tymoczko and colleagues, and forming part of the supplementary to Tymoczko's forthcoming "TAOM", include:

  • Monteverdi madrigals: Complete scores and analyses for books 3–5 of the Monteverdi madrigals (48 works) also to be seen in this part
    of the music21 corpus (but updated since that version).
  • Bach Chorales: 371 chorales, of which a subset of 20 was first released on music21.
  • Several further collections including a second set of analyses for most of the
    ChopinMazurkas

Mixed sources

Several corpora have full or partial coverage from more than one source. The most complex case is the the Beethoven Piano Sonata collection for which there are 3 external corpora, all of them incomplete:

  1. 64 movements from DCML's 'romantic_piano_corpus'.
  2. 36 movements from Dmitri Tymoczko's TAOM collection
  3. 32 movements (complete first movements) as converted from the
    'BPS-FH' dataset, ISMIR 2018.

There is not yet a single source for this collection. Are you tempted to attempt that? Do get in touch?

New corpora by MG and colleagues

  • Bach Preludes: Complete preludes from the first book of Bach's Well Tempered Clavier (24 analyses)
  • Ground bass works by Bach and Purcell.
  • Nineteenth-century songs: A sample of songs from the OpenScore / 'Scores of Scores' lieder corpus
    ([mirroring the public-facing score collection hosted here](https://musescore. com/openscore-lieder-corpus/sets)), including analyses for the complete Winterreise and
    Schwanengesang cycles (Schubert),
    Dichterliebe (Schumann),
  • and many of the songs by women composers that constitute a key part of and motivation for that collection.

Code and Lists

For developers, please see the individual code files for details of what they do and how.

Run code scripts from the repo's base directory (When-in-Rome) using the format:

>>> python3 -m Code.<name_of_file>

For example, this is the syntax for processing one score (feedback, slices, etc.):

>>> python3 -m Code.updates_and_checks --process_one_score OpenScore-LiederCorpus/Bonis, _Mel/_/Allons_prier!

Briefly, this repo. includes:

  • The Roman Umpire for providing automatic 'feedback' files. It takes in a harmonic analysis and the corresponding score to assess how well they match. Working in Harmony is an initial attempt at an interactive app for making use of this code online (no downloads, coding, dependency).
  • Anthology for retrieving instances of specific chords and progression from the analyses.
  • Pitch_profile for producing the profile and feature information discussed above.

Here are a couple of example of what all that can lead to:

A histogram of augmented chord usage in the lieder corpus ... histogram of augmented chord usage in the lieder corpus

... and a histogram of fifth progression types across corpora: histogram of fifth progressions across corpora

Licence, Citation, Contribution

Licence

New content in this repository, including the new analyses, code, and the conversion (specifically) of existing analyses is available under the CC BY-SA licence (a free culture licence) except by arrangement. Please get in touch with requests for special permission.

For analyses that originated elsewhere and have been converted into the format used here, please refer to the original source for licence. Links are provided to those original sources throughout the repository including the itemised list above and within every analysis.txt file.

These external licences vary. As far as we can tell, all the content here is either original to this repo,
or properly credited and fair to use in this way. If you think you see an issue please let us know. Again, if you are simply looking for a scores in a maximally permissive licence, then head to the OpenScore collections which are notable for using CC0.

For research and other public-facing projects making use of this work, please cite or otherwise acknowledge one or more of the papers listed below as appropriate to your project.

Citation

Here's the best way to cite the code and/or corpus:

@article{gotham_when_2023,
	title = {When in {Rome}: a meta-corpus of functional harmony},
	shorttitle = {When in {Rome}},
	journal = {Transactions of the International Society for Music Information Retrieval},
	author = {Gotham, Mark and Micchi, Gianluca and Nápoles-López, Néstor and Sailor, Malcolm},
	year = {2023},
}

Alternatively, depending on the specific context, it may be appropriate to cite one of the papers using this data and functionality:

Syntax and Contributing

As the papers attest, harmonic analysis is fundamentally, necessarily, and intentionally a reductive act that includes a good degree of subjective reading. As such, these analyses are not in any sense 'definitive', to the exclusion of other possibilities. Quite the opposite: part of the point of having a representation format like this is to enable the recording of variant readings. Please feel free to re-analyse these works by using the existing analysis as a template and changing the parts you disagree with.

  • For minor changes, consider integrating your edits into the existing file using the variant (var) option that rntxt provides. E.g. m1 I b2 IV followed by a new line with m1var1 I b2 ii6
  • For more thoroughly divergent analyses, a new file may be warranted. In that case, perhaps credit the original analyst too in the format - Analyst: [Your name] after [their name]
  • For any cases of clear errors, please submit a pull request with the correction.

For more details of the RomanText format used to encode analyses here, see:

when-in-rome's People

Contributors

giamic avatar magiraud avatar malcolmsailor avatar markgotham avatar napulen avatar shoogle avatar vpavlenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

when-in-rome's Issues

Introduce github actions and improve tests

When I started working with the repo I had some failing tests. Those were probably due to path issues but I could not verify that the tests were actually supposed to pass because github is not running them.

I propose to do the following:

DCML>RNTXT (enhancement). Software version in metadata

One to consider:

For DCML>RNTXT and any other new conversions, possibly add software version in the metadata preamble.

music21 version (exists)

  • music21._version.__version_info__

Conversion version (TODO?):

  • DCML v2 DCML-romantext parser v. 2023.07 or similar.

ms3 version notes from @johentsch

import ms3
from git import Repo

data_repository = Repo(corpus_path)
print(f"data_repository @ {data_repository.commit().hexsha[:7]}")
print(f"ms3 version {ms3.__version__}") # should work with most libraries

Reger ex 20

Corpus/Textbooks/Reger,_Max/Modulation/20/score.mxl seems to be an empty musescore file.

Screen Shot 2022-08-16 at 10 34 31 AM

Ditto for Corpus/Textbooks/Reger,_Max/Modulation/41/score.mxl

@napulen, is there a repository specifically for the Textbook examples I should post this issue to?

URL column in tsv files

It seems that the OpenScore list includes a URL column, but none of the other tsv files has this column.

Would it be worth it to have an empty URL on all other tsvs, just for consistency? No strong preference here. It could be that this one has that unique column

Esoteric issue: Indicating within-measure positions using metrical beats when the meter changes

m. 221 is divided into the last half bar in 2/2 meter of variation XI and the first quarter-length upbeat measure of variation XII in 3/4 meter. Indicitating the position of V6 as b. 2 is therefore correct if we consider m. 221 to be in 2/2 and wrong if we consider it to be in 3/4. In the encoding here, the measure is still in 2/2 so maybe everything is in order...

"Slices" tsv, when should it exist?

It seems in some files with scores (e.g., Boulton,_Harold from the 19th century songs) there is no slices.tsv, whereas in others there is. I thought the rationale for slices.tsv would be that it can be an alternative to working with the score.

I also remember discussions about removing slices.tsv entirely. Maybe this is a good time to decide what exactly should happen regarding slices and commit to that decision?

"Exception: Too many notes" on parsing some analyses

Some analyses fail to parse giving the following error:

music21.romanText.translate.RomanTextTranslateException: too many notes in this measure: m24 V V b1.5 VI b1.5 VI b2 iio6 b2 iio6 b2.5 V b2.5 V b3 i b3 i b4 Eb: V6/ii b4.5 c: V/iv

This happens for example in Mozart sonatas: K570/2, K283/2, K331/1.

Fix testRomanUmpire

The test on the roman umpire fails. There are two issues:

  • analysisOnScore.mxl can't be parsed
  • the output of the umpire is not the same for all alternatives (separate score and analysis, slices and analysis, and analysis on score — which anyways can't be parsed yet)

Can't parse Beethoven quartets

The score of the Beethoven quartets can't be parsed. I receive the following error:

rs/micchig/PycharmProjects/When-in-Rome/Corpus/Quartets/Beethoven,_Ludwig_van/Op059_No2/4/
Error with: /Users/micchig/PycharmProjects/When-in-Rome/Corpus/Quartets/Beethoven,_Ludwig_van/Op059_No2/4/. not a valid pitch specification: e.VI

Different scores give similar errors, with the pattern being that the pitch specification key.degree is not valid

No repetitions; agree to encode "seconda volta" only?

Looking into BPS: There is the following convention in several sonata-allegro movements

mXa V
mX+1a V

mXb I
mX+1b I

This is a problem when the repetition does not take you to the beginning (e.g., there is an introduction before the exposition, as in Les Adieux, Pathetique, etc.). We have discussed all that in the past.

Furthermore, in TAVERN, all the analyses encode the seconda volta and omit the prima volta measures from the annotation and the score.

So here is the dilemma:

  • Repetitions could be encoded, but then
    • There needs to be consistency, and I know that at least all of TAVERN should be revised regarding repetitions. And there are lots of repetitions.
    • There needs to be some fix in place to allow repetition starts, as pointed out previously in our discussions and music21 issue
  • Repetitions could be omitted in general, but then
    • They should be removed from BPS (and maybe other places)

If the second, the encoding of the beginning would turn into something like

mX I
mX+1 I

and that could be a convention to adopt in general.

Personally, I prefer to not have repetitions or have them as Notes (i.e., comments).

Of course, if having them is desired, then that should be the convention to follow throughout.

Strange chord in Purcell Z807

I finally traced down the chord that behaves strangely in my code. It is the V63b3 in m212, nearly at the end of Purcell's Z807:

m207 b1 v6
m208 b1 VI b3 iv6
m209 b1 v b2.5 v42 b3 III7
m210 b1 IV b2 v
m211 b1 #viø65 b2 c: III+6 b3 i64
m212 b1 g: It6 b2 V63b3
m213 b1 iiø43
m214 b1 V b2.5 V642 b3 i6
m215 b1 iv9 b2 v7
m216 b1 iiø42 b2 i
m217 b1 i42 b2 bVII[add2]
m218 b1 iiø43 b2 viio42
m219 b1 V b2.5 viio64 b3 i6
m220 b1 iv7 b2 V
m221 b1 i

On IMSLP, I could only find a 2-keyboard arrangement, looking like this for mm.211-213:
image

EDIT: Sorry about the lack of clefs: here is the full line:

image

Not easy to make a judgement from that score about the Italian or the dominant in mm.212!

Maybe v6 instead of V63b3?

`chord_usage`

  • Use existing test for checking on creation of empty slices_with_analysis.tsv files
  • New test against this.
  • Remove existing test

Typo in analysis

In the analysis of the following piece:

Composer: Franz Schubert
Title: Winterreise, D.911 - 18: Der stürmische Morgen

In the following line,

m13 V42/IV b1.5 IV6 b2 iioø43 b2.5 iii6 b3 I

iioø43 should presumably be iiø43.

Screen Shot 2022-03-16 at 9 24 34 AM

(Sorry I am not submitting a pull request but I think it will be easier for you to delete the excess character than for me to create a fork for this purpose.)

Cadenza in or out? K398

Facing a dilemma with K398. The "micchi" score doesn't feature the long cadenza, the TAVERN one does. The annotation seems to include annotations for the entire cadenza. Technically, it can be included.

The misalignment on the TAVERN score is much greater than the "micchi" one, so more work for including the cadenza, and more risk because of crazy time signature changes (33/8, 27/8, 16/4, and on and on).

I personally don't think that much is to gain from the cadenza, and much is to lose if anything goes wrong in the parsing/conversion of those weird measures. I would leave it out and jump straight into the next section. That means that the time people spent annotating those harmonies within the cadenza are discarded. About 30 annotations.

Thoughts?

image

Inconsistent use of slow and fast 3/8

Among the analyses in 3/8, some are notated in "slow" 3/8 (3 beats to the measure), while others are notated in "fast" 3/8 (1 beat to the measure).

Slow 3/8 causes the music21 RomanText parser to raise an exception for b2 or b3, as it assumes fast 3/8 (see this issue). But in any case, a parser has no way of knowing whether slow or fast 3/8 is intended, so probably this should either be specified explicitly, or else a consistent choice made across files.

I haven't manually inspected all the files below, but I can attest that at least the following two files are inconsistent:

slow 3/8: Corpus/Quartets/Beethoven,_Ludwig_van/Op132/3/analysis.txt
fast 3/8: Corpus/Quartets/Beethoven,_Ludwig_van/Op018_No4/2/analysis.txt

Files that may use slow 3/8 include (output of the command rg -l '3/8' . | rg 'analysis.txt' | xargs rg -l 'b[2-9]\s'):

./Piano_Sonatas/Beethoven,_Ludwig_van/Op026/1/analysis.txt
./Etudes_and_Preludes/Bach,_Johann_Sebastian/The_Well-Tempered_Clavier_I/03/analysis.txt
./OpenScore-LiederCorpus/Paradis,_Maria_Theresia_von/12_Lieder,_1786/05_Die_Tanne_(Sieh_Doris,_wie_vom_Mond_bestrahlt_)/analysis.txt
./OpenScore-LiederCorpus/Chaminade,_Cécile/_/Berceuse/analysis.txt
./OpenScore-LiederCorpus/Schumann,_Robert/Frauenliebe_und_Leben,_Op.42/3_Ich_kann’s_nicht_fassen/analysis.txt
./OpenScore-LiederCorpus/Reichardt,_Louise/Zwölf_Gesänge,_Op.3/04_Wachtelwacht/analysis.txt
./OpenScore-LiederCorpus/Schubert,_Franz/Op.59/3_Du_bist_die_Ruh/analysis.txt
./OpenScore-LiederCorpus/Coleridge-Taylor,_Samuel/_/Oh,_the_Summer/analysis.txt
./OpenScore-LiederCorpus/Schubert,_Franz/Winterreise,_D.911/09_Irrlicht/analysis.txt
./Quartets/Haydn,_Franz_Joseph/Op20_No1/3/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op018_No6/4/analysis.txt
./OpenScore-LiederCorpus/Schumann,_Robert/Dichterliebe,_Op.48/09_Das_ist_ein_Flöten_und_Geigen/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op132/3/analysis.txt

Files that may use fast 3/8 include (output of the command rg -l '3/8' . | rg 'analysis.txt' | xargs rg -l 'b1\.\d+\s'):

./Quartets/Haydn,_Franz_Joseph/Op20_No1/3/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op018_No4/2/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op130/4/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op132/3/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op018_No6/4/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op074/2/analysis.txt
./Quartets/Beethoven,_Ludwig_van/Op059_No1/2/analysis.txt

DCML>RNTXT conversion (bug). Metric distortions due to empty labels

Hi @MarkGotham and @malcolmsailor,

The m21 stream of Tchaikovsky op37a08 has a metric bump in m. 5 whose offset comes an eighth early:

{0.0} <music21.metadata.Metadata object at 0x7f402fad6d40>
{0.0} <music21.stream.Part 0x7f40dccc9d20>
    {0.0} <music21.stream.Measure 0 offset=0.0>
        {0.0} <music21.key.Key of b minor>
        {0.0} <music21.meter.TimeSignature 6/8>
        {0.0} <music21.roman.RomanNumeral i6 in b minor>
    {1.0} <music21.stream.Measure 1 offset=1.0>
        {0.0} <music21.roman.RomanNumeral i in b minor>
        {1.0} <music21.roman.RomanNumeral #viio6 in b minor>
        {2.0} <music21.roman.RomanNumeral VI6 in b minor>
    {4.0} <music21.stream.Measure 2 offset=4.0>
        {0.0} <music21.roman.RomanNumeral VI6 in b minor>
        {1.0} <music21.roman.RomanNumeral V6 in b minor>
        {2.0} <music21.roman.RomanNumeral vo6 in b minor>
    {7.0} <music21.stream.Measure 3 offset=7.0>
        {0.0} <music21.roman.RomanNumeral vo6 in b minor>
        {1.0} <music21.roman.RomanNumeral IV6 in b minor>
        {2.0} <music21.roman.RomanNumeral #viio2 in b minor>
    {10.0} <music21.stream.Measure 4 offset=10.0>
        {0.0} <music21.roman.RomanNumeral V[no5][add6] in b minor>
        {1.0} <music21.roman.RomanNumeral #viio64 in b minor>
        {1.5} <music21.roman.RomanNumeral i in b minor>
    {12.5} <music21.stream.Measure 5 offset=12.5>
        {0.0} <music21.roman.RomanNumeral VI in b minor>
        {1.0} <music21.roman.RomanNumeral iv65 in b minor>
        {1.5} <music21.roman.RomanNumeral Ger6 in b minor>

Inspecting the rntext reveals an empty label at the end of m. 4:

m4 V[no5][add6] b1.67 #viio64 b2 i b2.33
m5 VI b1.67 iv65 b2 Ger6
m6 V[no3][add#9][add4] b1.67 V7 b2 i b2.33

The problem is caused by non-harmony labels such as { and seems to be happening systematically, e.g.:

The problem can easily be avoided by skipping all rows that have an empty value in the column chord while converting, e.g.

https://github.com/DCMLab/tchaikovsky_seasons/blob/b1cd416d972cca31349d348bc8383f2e16d73269/harmonies/op37a07.tsv#L45

Reimplement pitch profiles (AKA distributions) with np.arrays

Operations on numpy arrays are much faster and also the code is much more compact because it avoids explicit for loops. Swapping is not a trivial task because it would require carefully examining every place where pitch distributions are used and making the conversion. A first step could be to implement functions to input and output lists, but internally use np.arrays.

Doctests

There are a few doctests.
Move all to the new tests area, or implement a way to run them where there are (from tests).

Fix all `54`s and discussion wrt inconsistencies e.g., suspensions

There are several options (including in music21-compatible syntaxes) for expressing figures like suspensions.

WiR is currently not entirely consistent. That's hardly surprising as it's grown gradually over a long time, with many contributors, and even music21 has changed a lot during that time. In any case, the analyses would be visually clearer if there was consistency across the meta-corpus, and this should be possible without change / loss of information.

Suggestion:

  • establish recommended defaults.
  • accept any other valid options ... but map to the default at the point of a PR?
  • obviously reject PR if failing.

E.g.,

  • V642, V42, and V2 all stand unambiguously for the same thing.
  • Accept all, but map written form to V2?
  • Leave alterations like #4 unchanged (or implement more complex check on 67 defaults as a separate dev).

Suspensions provide a slightly more complex case in point. Suggestion:

  • Valid options (chose among):
    • [add4][no3].
      • Pro: clear, works (always?), implemented in @malcolmsailor's latest work on the DCML converter, used a lot
      • Con: Verbose, not great e.g., when displaying on a scores.
    • sus4.
      • Pro: More intuitive for analysts,
      • Con: but documented on m21 (can fix that) or even clear where it appears in the code?! Search sus and you get nothing much.
    • 54
      • Pro: Best for concision. Works including recent fix of inversion handling thanks to @malcolmsailor. Also used extensively in the corpus (I54).
      • Con: Not necessarily as intuitive as sus. Some potential confusion over I vs V.
  • Recommended (map all to):
    • One of the above. Is sus4 reliable? Best?
  • Doesn't work:
    • 5(4). @napulen: AugmentedNet outputs this syntax, so you'll want to change that.

TODO:

  • Anyone, but especially @jacobtylerwalls: I'd be grateful for double check on this please, especially if there's a known reliability differential between the options.
  • Anyone. Once we've decided, check and replace.

Thanks as always for any input / suggestions / PRs.

Weird line in Corpus/BPS-FH/23-i.txt

Line 65 of this file has this content:

m94 b468 I64 b471 V7 b474 e: i b486 V42 b498 i b510 Ab: V42 b522 I b534 vii7/ii b546 Db: V7 b547.5 vii7/vi b549 V7 b550.5 vii7/vi b552 V7 b553.5 vii7/vi b555 V7 b556.5 vii7/vi b558 V7 b1 V42 b3.5 E: ii6

Probably something funky going on with those beat numbers. I haven't looked at the score.

1st and 2nd endings overlap in DCML conversions

There's already an issue that I created at the music21 repo for this problem, but I think there should be one here since it means that many of the DCML conversions are presently corrupt.

The basic problem is that the parser doesn't handle 1st and 2nd endings, meaning the chord changes from both are put into the same measure, leading to incorrect results that crash the music21 parser. For example, from Corpus/Quartets/Beethoven%2C_Ludwig_van/Op018_No3/4/analysis.txt

m117 I6 I6
m118 viio6 viio6
m120 I iiio6 b1.33 IV b2.33 ii6

I think the best (and really only tractable) solution for music21 at the moment is to put the 1st endings into metadata and only write out the 2nd endings. (See the discussion between Myke and me at the music21 issue.) I plan to implement this myself sometime soon. This will at least result in valid romantext output. Somebody could write an additional script that compares the measures files from the ABC repo to add the first endings back in, but I think Myke's argument that this should not be a part of music21 itself is persuasive (and in any case what goes into music21 is up to him), and I won't be able to do it myself.

Readme download section seems to be out-of-date or incorrect

The readme says:

For downloading a local copy of remote files, see Code.updates_and_checks.remote_scores and the argument convert_and_write_local.

However, there is no function called remote_scores in Code/updates_and_checks.py. There is such a function in Code/collect_convert.py but it appears to be a different function and doesn't have a convert_and_write_local argument. There is a convert_and_write_local function in this second file as well, but there doesn't seem to be an argument by this name anywhere in the code base.

Weird side effects while parsing Fanny Mendelssohn Hensel score

This is related to "Corpus/OpenScore-LiederCorpus/Hensel,_Fanny_(Mendelssohn)/6_Lieder,_Op.1/2_Wanderlied/score.mxl".

There is a MusicXML (and MuseScore) score included for this annotation, which looks like this when opened on MuseScore
image

When parsing and .show()ing with music21, the resulting score has most duration values corrupted.

image

It's difficult to tell what exactly went wrong, because it is a silent fail. The score doesn't show any warnings or errors when show()ed; one exception is when running .expandRepeats(), in that case, it does crash with the following error:

>>> s.expandRepeats().show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/napulen/dev/AugmentedNet/.env/lib/python3.8/site-packages/music21/stream/__init__.py", line 13042, in expandRepeats
    post.insert(0, p.expandRepeats(copySpanners=False))
  File "/home/napulen/dev/AugmentedNet/.env/lib/python3.8/site-packages/music21/stream/__init__.py", line 8562, in expandRepeats
    post = ex.process()
  File "/home/napulen/dev/AugmentedNet/.env/lib/python3.8/site-packages/music21/repeat.py", line 770, in process
    raise ExpanderException(
music21.repeat.ExpanderException: cannot expand Stream: badly formed repeats or repeat expressions

If I have to guess, it may be that the triplets on the right hand were moved to the lower staff at the beginning (in fact, I didn't know that was possible to encode in MuseScore!) and that's confusing music21.

Feedback on analysis needs update

It seems like the feedback on the analyses has changed quite a lot with respect to the version stored in the github repo. I would check these differences and update the version on the repo. Or, we could remove all feedbacks altogether and provide the users with a super simple way to reobtain them.

Monteverdi score correction (where should .mxl corrections go?)

I noticed an error in Monteverdi Book 5, no 4: the highlighted E-flat should be followed by another E-flat (and tied to it, in the case of the upper of the two voices, which has a dangling tie), not by E-natural.

Screen Shot 2022-09-15 at 6 56 01 AM

This inspires a meta-question: where should such corrections go? Should I submit them to this repository or elsewhere and then we will merge them in from there?

Also what is the best way of creating .mxl files? In this case I think the easiest might be to unzip it, edit the .xml as plain text, then rezip it.

Accompanying score files

Food for thought related to files that have no accompanying score:

Several collections have missing accompanying scores in this repository, but have scores elsewhere (with a specified or unspecified license). More specifically:

  • Scores from the TAVERN dataset
  • Bach Chorales
  • Beethoven Piano Sonatas
  • Beethoven String Quartets
  • Haydn String Quartets
  • Some on the Variations and Grounds

How about a standardized "link" file accompanying each analysis file in this repository, which leads to wherever those scores can be found/requested.

For example, for the file /Corpus/Variations_and_Grounds/Mozart,_Wolfgang_Amadeus/_/K025, which currently has an analysis_A.txt (and analysis_B.txt) file but no score.mxl, something like this would be included within the folder:

link_to_score.txt

Score: https://github.com/jcdevaney/TAVERN/blob/master/Mozart/K025/Krn/K025.krn
Format: Humdrum
Attribution: Devaney et al.

Plus any other fields that are adequate for that file (e.g., License: or Citation:)

I have verified that TAVERN, Bach Chorales, Beethoven String Quartets, Haydn String Quartets all have digital scores that we can link to, regardless of whether there is permission to redistribute them or not. This practice seems to be used by other meta-corpus projects, such as muspy and mirdata.

Leaving GitHub issues on the original repositories is of course another option, it's possible that the researchers are willing to share these scores freely and they may have just omitted to include a permissive license.

Parsing issues in BPS-FH dataset

Found issues in at least these files:

  • BPS-FH_01-i.txt
  • BPS-FH_12-i.txt
  • BPS-FH_17-i.txt
  • BPS-FH_03-i.txt
  • BPS-FH_02-i.txt
  • BPS-FH_23-i.txt
  • BPS-FH_30-i.txt
  • BPS-FH_07-i.txt
  • BPS-FH_19-i.txt

Observations on local keys: annotation decisions or parsing error?

Hi everyone, we are currently evaluating a system built on your corpus, and a potential inconsistency was brought up by the musicologists involved in our study. It concerns the annotation of local keys in the score, and in particular, it was spotted for Schubert's Das Wandern in the OpenScore-LiederCorpus subset.

From the analysis on the score (analysis_on_score.mxl), at measure 15, an f letter is used before the Roman numeral (see the screenshot below).
Screenshot 2022-08-11 at 13 59 35

When reading this with music21, the letter is interpreted as F minor (see the snippet below).

import music21

romantext_sample = "OpenScore-LiederCorpus/Schubert,_Franz/Die_schöne_Müllerin,_D.795/01_Das_Wandern/analysis.txt"

m21_score = music21.converter.parse(
    romantext_sample, format='romanText')

numerals = [(numeral.getContextByClass('Measure').measureNumber, numeral) \
            for numeral in m21_score.recurse().getElementsByClass('RomanNumeral')]

numerals[20:]

Which produces the following output (an excerpt is reported here).

...
(14, <music21.roman.RomanNumeral i in g minor>),
 (15, <music21.roman.RomanNumeral I6[no1] in f minor>),
 (15, <music21.roman.RomanNumeral V65[no3] in f minor>),
 (15, <music21.roman.RomanNumeral I in f minor>),
 (15, <music21.roman.RomanNumeral V6 in f minor>),
...

However, our musicologists observed that the annotation would be incorrect if interpreted in this way. In particular, the annotator that performed the Roman numeral analysis has potentially used f instead of F to indicate the brief moment in which F is tonicized; and that it is unlikely that it is F minor as there are no Abs.

Therefore, should we consider the output of the music21 parser correct (meaning that, f is actually F minor), and interpret this as an annotation decision? Or could this be a bug (or a "corner case") of the parser? Thanks a lot for your time.

Checking all scores for music21.converter.parse(filePath, format='romantext'). EDIT: Wrong flat symbol (`-`) in many BPS-FH scores

Performing a basic check for music21-parsing (music21.converter.parse(filePath, format='romantext')) on all scores described in this repository. Most scores are parsed without errors, however, the following list throw exceptions:

[
    'BPS-FH_27-i.txt',
    'BPS-FH_10-i.txt',
    'BPS-FH_01-i.txt',
    'BPS-FH_12-i.txt', 
    'BPS-FH_13-i.txt', 
    'BPS-FH_29-i.txt', 
    'BPS-FH_17-i.txt', 
    'BPS-FH_06-i.txt', 
    'BPS-FH_03-i.txt', 
    'BPS-FH_04-i.txt', 
    'BPS-FH_08-i.txt', 
    'BPS-FH_02-i.txt',
    'BPS-FH_23-i.txt', 
    'BPS-FH_16-i.txt', 
    'BPS-FH_22-i.txt', 
    'BPS-FH_25-i.txt', 
    'BPS-FH_11-i.txt', 
    'BPS-FH_21-i.txt', 
    'BPS-FH_26-i.txt', 
    'BPS-FH_30-i.txt', 
    'BPS-FH_18-i.txt', 
    'BPS-FH_07-i.txt', 
    'BPS-FH_32-i.txt', 
    'BPS-FH_31-i.txt', 
    'BPS-FH_05-i.txt', 
    'BPS-FH_19-i.txt'
]

Haven't looked into the root cause yet.

More to come on this.

op28 (bps_15_01)

Assuming that the scores at the functional-harmony repository are the accompanying score to the analyses of Beethoven Op.28, there is something weird happening when aligning the score with the annotation file.

Issue happens around mm.163a:

image

It seems that the I annotation at mm.167 (highlighted) should appear in mm.168 instead. This triggers a misalignment of the scores starting from this measure.

It is very tricky because there is that repetition bar (indicated in the analysis as 163a, and I assume 163b on the other side), which spans two measures on the right side of the repetition bar.

Not sure whether it is an error in the MusicXML file, or maybe the annotations should change the measure numbers?

(I'll be inspecting the BPS score-annotation alignment more in depth for the following days)

Remote paths for scores

Support option for parsing score from remote path.
Reasons:

  • avoid duplicating files (complex and rights issue);
  • flexible wrt changes to those scores (e.g., external corpus update);
  • flexible changing which score to use (e.g., new corpus);
  • shrinks meta-corpus to fraction of the size.

Dez output

Hi @giamic and @magiraud

As you know, we now have dez output on here.

That all looks fine ... but I tested the output using a score known to exist on Dezrann (Bach C major prelude) I get labels that have not width/duration. Duration tags are clearly present in the dez file, so not sure what's up there ... unless dez uses an end tag rather than duration?

Please excuse my the incomplete error report -- sharing now in case this is a known issue. If not, I'll dig deeper to try and see what's up, probably by comparison with another dez analysis.

Have pandas dataframe to store slices data

One of the most well-known libraries for doing statistical analysis is Pandas. This provides a nifty interface to databases basically implemented as a tabular where rows represents different instances in the dataset and columns different properties.

Pandas uses the power of numpy arrays to provide a lot of convenience function to do statistical analysis. It is easy to take average and standard deviation of different quantities, group the dataset by values, select only the data that satisfies certain conditions, and even visualise the data in plots. If we want to give users the ability to do meaningful and easy statistical analysis in a programmatic way, I think that pandas is the way to go.

The downside is that pandas is a bit bulky and adding that dependency is a bit heavy. However, it is easily accessible everywhere (conda and pip).

Broken Humdrum score links

Some of the links for remote Humdrum scores seem to be broken. For example Corpus/Keyboard_Other/Bach,_Johann_Sebastian/The_Well-Tempered_Clavier_I/22_fugue/remote.json has https://kern.humdrum.org/cgi-bin/ksdata?l=musedata/bach/keyboard/wtc-1&file=wtc1f22.krn&f=kern but this url seems to return an empty page (tried opening it both with the Python requests library, wget, and my browser).

Clean-up for consistent dir contents

Dear @napulen, @malcolmsailor, all,

As we gradually move towards a first coherent form and release of this dataset, it's time for a cleanup to make the contents of each directory as consistent as possible.

I'm inclined to think that less is more, so would suggest we include in each folder only the minimum:

  • score.mxl
  • analysis.txt
  • (Optionally) Working/ for clarity on conversion processes.

All of the additional files like ...

  • slices
  • profiles
  • etc.

... are easy to generate anew with code and instructions provided, but not included by default. There are now a lot of different variants (see the Code/Example folder for all supported). As always, I welcome counter suggestions and requests. Speak now!

Omitted annotation in first measures of Monteverdi madrigals

Several Monteverdi madrigals display the following pattern:

image

The first measures have some weird/chromatic/ambiguous/monophonic/etc. content, and thus have been omitted from the annotation.

Composer: Claudio Monteverdi
Madrigal: 4.20
Title: Piagn'e sospira: E quand'i caldi raggi
Time Signature: 4/4
Key Signature:

Note: Very chromatic music; key is hard to determine
Note: m1-4 single voice, m5-6 two voice texture
m5 g: III
m6 viio6
m7 i
m8 V/ii
m9 ii b3 VII
m10 V/v
m11 v6 b2 v b3 III6
m12 VII b3 ii[no3]
m13 I
m14 IV
m15 ii b3 v

The problem is that note offsets will never align because the annotation file will start on m5 with offset 0.0 while the score will have a different offset for m5.

A number of things can be done for this:

  1. Add "dummy" annotations of all measures in the RomanText, to mimic the original structure of the piece, although the annotations will be ambiguous/incorrect
  2. Whenever the first measure in the RomanText != m0 or m1. Insert blank measures with no annotations just to preserve the proper offset once the annotations tart
  3. Remove the first measures from the score, and start right from the place where the annotations start. Add a text note somewhere in the score about measures removed and why they were removed (e.g., as an Ossia image or similar)

I don't like (2) because there are too many things that can go wrong (e.g., what happens if the beginning is a recitativo with a very weird time signature; you won't be able to know that or infer how many quarterLength values have happened before your annotations start). I incline for (1) or maybe (3).

No idea which one is the best compromise, hence, the issue.

Fix all tests

Some tests still fail. The reasons for error I identified are two:

  • missing files
  • variations in the corpus statistics

DCML>RNTXT conversion (bug). 4x fail to load.

DCML Missing scores

First, the new DCML repos are missing some scores (reported there). Note to self: update when they fill in the gaps.

Conversion from DCML fails

Hi @malcolmsailor . I ran the DMCL-rntxt converter on all the new DCML analyses and have mostly great results. Some exceptions:

Parses and writes fine, but probably warrants a clean up:

  • Some [no5][no5] duplication in DCML to rntxt. E.g., Corpus/Keyboard_Other/Chopin,_Frédéric/Mazurkas/BI145-1/analysis.txt

Actually fails to parse = 4 cases in total:

  • Corpus/Keyboard_Other/Liszt,_Franz/Années_de_pèlerinage/S161_5_Sonetto_104_del_Petrarca/
  • Corpus/Keyboard_Other/Liszt,_Franz/Années_de_pèlerinage/S160_6_Vallee_dObermann/
  • Corpus/Keyboard_Other/Medtner,_Nikolai/Tales/Op34_No3/
  • Corpus/Keyboard_Other/Medtner,_Nikolai/Tales/Op35_No2/

Example error message:

  • Corpus/Keyboard_Other/Liszt,_Franz/Années_de_pèlerinage/S161_5_Sonetto_104_del_Petrarca raises this exception music21.roman.RomanNumeralException: Cannot make a dominant-seventh chord out of 7[addb1o]. Figure should be in ('7', '65', '43', '42', '2')

TAVERN Cadenzas - use long fake long measures in rntxt analyses

Hello, I am currently parsing the analysis files in the corpus via the music21 converter. By doing this, I found a couple of annotations with potentially inconsistent timings (see below).

  • Etudes_and_Preludes/Bach,_Johann_Sebastian/The_Well-Tempered_Clavier_I/03/analysis.txt, which raises:
    • music21.romanText.translate.RomanTextTranslateException: too many notes in this measure: m103 I64 b3 V7
  • Variations_and_Grounds/Mozart,_Wolfgang_Amadeus/_/K398/analysis_A.txt, which raises:
    • music21.romanText.translate.RomanTextTranslateException: too many notes in this measure: m156 I64 b7 I64 b13 I64 b19 I64
    • music21.romanText.translate.RomanTextTranslateException: too many notes in this measure: m161 I64 b5 I64 b9 I64 b11 I64 b15 I64

Hope this helps, and thanks a lot for this fantastic corpus!

Improved metadata

Composer: Tchaikovsky, Pyotr
Title: op37a07
Analyst: DCMLab (https://github.com/DCMLab/). Licence CC-BY-NC-SA.
Proofreader: See the source repository for details.

Could we please ask to clearly identify the origin of the DCML analyses? As a quick fix, it would be OK for groups of files to point to their respective corpus repositories (not the DCMLab organization), or, even better because sustainable, their DOIs. Ideally, the names of annotators and reviewers would figure, too, but a small mention of our metadata or a small nod to "the collaborators of the DCML" or something along those lines would already be warmly welcomed.

On the more general side, and something where I lag behind my own expectations myself, I would love to see commit hashes that identify for each converted file the version of the code used to produce it as well as the version of the original data. Apart from that, it is currently not straightforward to trace back the files to the originals, consider for example the relative paths Corpus/Keyboard_Other/Tchaikovsky,_Pyotr/Seasons,_Op.37a/7/analysis.txt <-> tchaikovsky_seasons/harmonies/op37a07.tsv. In combination, hashes, version numbers and links to the source (code) files would represent a splendid boon for data traceability and open science.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.