Hit Machine

Generating Lyrics and Chords using T5

Ziv Keidar and Eitan Bentora

In this Project we created a module writing original chords and lyrics using HuggingFace's T5.
The models in the project were trained on a dataset we created by web-scraping the website e-chords. Our goal in this project was to take a step in the direction of an AI model preforming all the steps in writing and composing songs. This file explains our process in this project including failed trials and reproduction instructions.

Data

Since we did not find any dataset available containing lyrics and chords, we web-scraped a new dataset. The created dataset can be found in chords-and-lyrics-dataset

Scraping the data:

Used the Spotify artists.csv file containing information about 1,104,349 artists including a popularity metric and artist's genres
Used the names in the Spotify dataset to find all the artists' songs on the website e-chords
Downloaded all the artists' songs and chords that were available on e-chords
The song's chords lyrics and tabs were all raw text, for example:

Capo on 3rd fret
\t  \t\t 

\r
Verse 1:
G             G/B                         C 
  Do you love the rain, 
does it make you dance

The code for the web-scraping can be found in: /Project/Code/WebScraping

Data cleaning

Split the songs into three parts
- Tabs dict
- Chords dict
- Lyrics dict

In each of these dicts the key was the line number in the original song and the value was a line of tabs chords or lyrics. 2. Predicting the language of every song using langdetect based on the lyrics of the song. 3. Keeping only songs in English. (The original dataset has many songs in Portuguese Spanish and more) 4. The code for the data cleaning can be found in: /Project/Code/DataCleaning

Models

Chords and Lyrics Model - version 1 (unsuccessful):

Predicting tabs chords and lyrics together

For this model we used the songs lyrics tabs and chords combined.
Replaced \n with <n>, multiple spaces with <s(n_spaces)> and tabs with <t>
Used the t5-base model.
Trained the model on the classic T5 fill in the blanks task.
Each song was fed to the model in the following format:
'Fill song <list of song's genres> : \n <song name> <song's chords tabs and lyrics combined, with missing parts for fill-in-the-blanks>'
The model's result were extremely messy and did not capture the format of a song and chords.
Result example: (input was 'summarise song [pop]: The game of chance')

n>s2>Gs3>Ems4>Ds5>Cs6>Ams7>G N>It's a game of chance, oh the game of luck s9>E7s8>D7 Same chords throughout the whole song.s10>A7sus4s11>D9s12>Dsus2s16>D s20>Dm7/As14>D5s13>A Jump in the middle of a symphony of chance -----------| -------3----|s15>G Dm7 Gs19>A6s17>A5 /---3---1---4---5---6---7---8---9---11---10--| x--0-------0--0--0------0---0----|. |---2------2------2--------2-----1-------1----s26>A4 >>|---|----/-----|-------__---- _---

We realized that we must further clean our data and split the task of lyrics generating and chords generation to two different models. We decided to disregard tabs for the rest of the project.

Lyrics Model - version 2:

Predicting the lyrics

For this model we used only the lyrics of the song
Replaced \n spaces and tabs as described above
Used the t5-base model.
Used the same fill-in-the-blanks task described above and the same input format: 'Fill song <list of song's genres> : \n <song name> <song's lyrics with missing parts for fill-in-the-blanks>'
Result example: (input was 'summarise song [pop]: The game of chance')

The game of chance n>s2>It's a game that I'm gonna play if I don't win a prizes3>(2x)s4>(3x)(2X)(3X) (x2)s5>(4x) (3x2) (x3)s6>(x4)s7>(1x) [email protected]s8>(2-x-x--x) s10>Gs9>Ds11>Cs12>D

The result was still very messy. We understood that we need to better clean the data and represents spaces and tabs in a different way.

Lyrics Model - version 2_2:

Predicting the lyrics improved

Again we used only lyrics of the song, but this time we improved the separation of lyrics chords and tabs and removed garbage from our data
Replaced \n with @
Used the t5-base model.
Used the same fill-in-the-blanks task described above and the same input format: 'Fill song <list of song's genres> : \n <song name> <song's lyrics with missing parts for fill-in-the-blanks>'
Result example: (input was 'summarise song [pop]: The game of chance')

The game of chance

It's a game of luck
I'm gonna play it for you
I don't wanna play it, but I've got a lot of fun
'Cause if you're gonna get it, you'll get it for me
I won't be playing it, I'll be playing the game

Verse 2:
I can't seem to find a way to play it
But I'd rather play it than play it 'cause it's the same
And if I had a chance
I would rather play the game than play
Pre-Chorus ...

This was much better and looked liked a song. There are some parts that don't make a lot of sense, but this can be tweaked with hyperparameters that will be discussed in the generation process. We are ready to start training a chords model.

Chords representation

Originally the chords were writen with many tabs and spaces to separate the different chords and align them with the word line below them. To represent the data in a more efficient way we changed the representation of chord lines in the following way:

Replaced all tabs with 4 spaces
Replaced every sequence of spaces with the number of spaces in the sequence
Separating adjacent chords with $ signs
For example:

'Em  \tG7/D' -> 'Em      G7/D' -> 'Em$8G7/D'

The code for this can be found in /Project/Code/T5/FineTuningT5/chords_data_preparation.py

Chords Model - version 1 (unsuccessful):

Predicting the chords given the lyrics

Represented the chords as described above
Used the t5-small model
Each song was split into chords and lyrics.
The input for the model was the lyrics of a song and the expected output was the chords of that song
The code for this can be found in /Project/Code/T5/FineTuningT5/CreateLabeledData/create_labeled_chords_data_v1.py

The results for this model were very bad, it did not manage to even maintain the structure of chords after a few lines. We understood that each output of the model needs to be smaller and that it is important that the model's input will include the previous chords.
Example output when the input was the lyrics generated by model v2_2:

G29D28G27Dm24G
G
D31Gm
C23G721D7M22G926D9M
G/B17Gsus416D6M/Gb19Gdim13D8Msu4G#m7/Dbm/F18G625Dsum4D/A15

We can see things that are not chords (Msu4, sum) and no separation between chords.

Chords Model - version 2:

Predicting the chords given the lyrics

Represented the chords as described above
Used the t5-small model
Each song was split into chords and lyrics.
This time we created a training example for each chord line in the song.
The training examples were created in the following way (window is a hyperparameter that was set to 16):
1. The expected output was the chord line.
2. The input for the nth chord line included:
  1. A prefix constant for the song: 'write chords <[genres]>:'
  2. The window previous chord and lyric lines
  3. The n+1th lyrics line
3. For example for a pop, rock song, with window=2 looking like this:
< Chords line 1 >
< Lyrics line 2 >
< Chords line 3 >
< Lyrics line 4 >
< Chords line 5 >
< Lyrics line 6 >

The resulting inputs and outputs:
- The first input would be write chords [pop, rock]: < Lyrics line 2 >
- The first expected output would be < Chords line 1 >
- The last input would be 'write chords [pop, rock]: < Chords line 3 > @ < Lyrics line 4 > @ < Lyrics line 6 >'<br>
- The last expected output would be < Chords line 5 >
1. This resulted in approximately 2 million training examples
2. This way the model only needed to generate one chord line each time, and when generating a chord line it had the context of the previous chords and lyrics.
The code for this can be found in /Project/Code/T5/FineTuningT5/CreateLabeledData/create_labeled_chords_data_v2.py

This time it really looks like a song! For example, using the same input as before:

The game of chance


Em7

Is a game that's all right

Cmaj7                       G

But I don't know if I'll ever win

Em7                Cmaj7
You've got to be the one to win and you're the only to lose

     Cmaj7          G

And I wont give you my heart

     Em7        D
Cause I know that I can win, but I wanna lose it

     Cmaj7

So I play my game

  G      Gsus
  
Of chance, the game I love

Generating

To generate a song we wrote a module called song writer This module takes the parameters lyrics_model and chords_model. The module supports writing lyrics, writing chords for existing lyrics and writing songs with chords and lyrics.
The functions that generate chords or lyrics accept generating params. We tried the following parameters

top_k=k: This means that when sampling the next word, only the k words with the highest probability can be sampled
top_p=p: This means that when sampling the next word, only the words with a probability higher than p can be sampled
temperature: This affects the distribution of the words. Decreasing the temperature increases the probability of getting words that have high probability
no_repeat_ngram_size=n: The model can not use the same ngram more than once in the output
num_beams=b: Instead of always using the word with the highest probability the model chooses the b words with the highest probability. After generating a sentence a tree graph is built with the words as nodes, the path with the highest probability is the path chosen.

Evaluation

To check if our model is as original as a real song, we decided to do the following:

Generated 45 song lyrics, using our best model. We will refer to these as the test lyrics.
Sampled 45 real songs, we will refer to these as the control lyrics.
From all the songs other than the 45 songs sampled, we sampled 10k songs, we will reffer to these as the lyrics population.
For each ngram_size greater than 4:
1. For each song in the test lyrics we checked how many of the songs in the lyrics population shared an ngram of size ngram_size with the test lyrics.
2. We did the same for the control lyrics with the population lyrics.
3. We compared the values that we got from the test lyrics and compared them to the ones we get from the control lyrics using one-sided T test with the following hypothesis:
  - H0 - generated lyrics are as original as real songs.
  - H1 - generated lyrics are less original than real songs.

The results:

Reproduction

Load the data Add link to data chords_split_lang.pkl
Filter only the wanted language and save it as a df in pickle format
FineTuning the lyrics model v2_2:
1. run /Project/Code/T5/FineTuningT5/PrepareText/prepare_text_v2.py
2. run /Project/Code/T5/FineTuningT5/CreateLabeledData/create_labeled_data.py
3. run /Project/Code/T5/FineTuningT5/Training/training.py lyrics v2
FineTuning the chords model v2
1. run /Project/Code/T5/FineTuningT5/CreateLabeledData/create_labeled_chords_data_v2.py
2. run /Project/Code/T5/FineTuningT5/Training/training.py chords v2
Generating songs(Code in /Project/Code/T5/Generating/generating.ipynb):
1. Load the two models created above and the t5 tokenizer
2. Create a songWriter instance
  song_writer = SongWriter(lyrics_model, chords_model, tokenizer, verbose=True)
3. Write songs!
  song_writer.write_songs(genres_list, song_inputs, lyrics_generating_params, chords_generating_params)

Future Work

Modeling the separation between the songs' verses and choruses and feeding it in to the model while training
Transforming all the songs' chords to the same base. In chords, the progression (C G Am F) and (D A Bm G), transitioning all the songs to the same base can reduce noise for the model
Training a different models for different genres
Trying to further tune the generation parameters
Trying to tune the input for the generation
Generating tabs
Multilingual song-writers

hagasus / nlp-project-hit-machine Goto Github PK

nlp-project-hit-machine's Introduction