Code Monkey home page Code Monkey logo

lyrics-generator's Introduction

Lyrics Generator

build status codecov

This is a small experiment in generating lyrics with a recurrent neural network, trained with Keras and Tensorflow 2.

It works in the browser with Tensorflow.js! Try it here.

The model can be trained at both word- and character level which each have their own pros and cons.

Pre-trained models

A few pre-trained models can be found here.

Train the model

Install dependencies

Requires Python 3.7+.

pip install -r requirements.txt

The requirement file has been reduced in size so if any of the scripts fail, just install the missing packages :-)

Get the data

  • Create a song dataset. See "Create your own song dataset" below.
    • Save the dataset as songdata.csv file in a data sub-directory.
    • Alternatively, you can name it anything you like and use the --songdata-file parameter when training.
  • Download the Glove embeddings
    • Save the glove.6B.50d.txt file in a data sub-directory.
    • Alternatively, you can create your a word2vec embedding (see below)

Create your own song dataset

The code expects an input dataset to be stored at date/songdata.csv by default (this can be changed in config.py or via CLI parameter --songdata-file).

The file should be in CSV format with the following columns (case sensitive):

  • artist
    • A string, e.g. "The Beatles"
  • text
    • A string with the entire lyrics for one song, including newlines.

You can have any number of other columns, they will just be ignored.

A sample dataset with a simple text is provided in sample.csv. To test things are working, you can train using that file:

python -m lyrics.train --songdata-file sample.csv --early-stopping-patience 50 --artists '*'

Dataset suggestions

  • Download billboardHot100_1999-2019.csv file from the Data on Songs from Billboard 1999-2019
    • Put it into the data/ folder and run python scripts/billboard.py script which will prepare the file for training.
    • (Optional) pip install fasttext to detect language. If it's not installed, language is not detected.

(Optional) Create a word2vec embedding matrix

If you have the songdata.csv file from above, you can simply create the word2vec vectors like this:

python -m lyrics.embedding --name-suffix _myembedding

This will create word2vec_myembedding.model and word2vec_myembedding.txt files in the default data directory data/. Use -h to see other options like artists and custom songdata file.

Run the training

python -m lyrics.train -h

This command by default takes care of all the training. Warning: it takes a very long time on a normal CPU!

Check -h for options. For example, if you want to use a different embedding than the glove embedding:

python -m lyrics.train --embedding-file ./embeddings.txt

The embeddings are still assumed to be 50 dimensional.

The output model and tokenizer is stored in a timestamped folder like export/2020-01-01T010203 by default.

Note: During experimentation, I found that raising the batch size to something like 2048 speeds up processing, but it depends on your hardware resources whether this is feasible of course.

Training on GPU

I have found it easier to train on GPU by using Docker and nvidia-docker, rather than try to install CUDA myself. To do this, first make sure you have nvidia-docker set up correct, and then:

docker build -t lyrics-gpu .
docker run --rm -it --gpus all -v $PWD:/tf/src -u $(id -u):$(id -g) lyrics-gpu bash

Then run the normal commands from there, e.g. python -m lyrics.train.

Tip: You might want to use the parameter --gpu-speedup! Just note that this will disable the Tensorflowjs compatibility, regardless of whether you have set the --tfjs-compatible flag.

Tip: If you get a cryptic Tensorflow error like errors_impl.CancelledError: [_Derived_]RecvAsync is cancelled. while training on GPU, try pre-pending the train command with TF_FORCE_GPU_ALLOW_GROWTH=true, e.g.:

TF_FORCE_GPU_ALLOW_GROWTH=true python -m lyrics.train --transform-words --num-lines-to-include=10 --artists '*' --gpu-speedup

Use transformer network

To use the universal sentence encoder or BERT architecture use the --transformer-network parameter:

python -m lyrics.train --transformer-network [use|bert]

Note: These models are not going to work in Tensorflow JS currently, so it should only be used from the command-line.

Note: I have not been able to get any result with BERT. Only included for illustration purposes.

Character-level predictions

In the default training mode, the model predicts the next word, given a sequence of words. Changing the model to predict the next character can be done using the --char-level flag.

python -m lyrics.train --char-level

Create new lyrics

python -m cli lyrics model.h5 tokenizer.pickle

Try python -m cli lyrics -h to find out more. For example, using --randomness and --text can be recommended.

If you want to add newlines to the seed text via --text, you need to add a space on each side. For example, this works in Bash:

--text $'you are my fire \n the one desire'

Export to Tensorflow JS (used for the app)

Note: Make sure to use the --tfjs-compatible flag during training!

python -m cli export model.h5 tokenizer.pickle

This creates a sub-directory export/js with the relevant files (can be used for the app).

Single-page "app" for creating lyrics

Note: Make sure to use the --tfjs-compatible flag during training!

The lyrics-tfjs sub-directory has a simple web-page that can be used to create lyrics in the browser. The code expects data to be found in a data/ sub-directory. This includes the words.json file, model.json and any extra files generated by the Tensorflow export.

Demo.

Development

Make sure to get all dependencies:

pip install -r requirements_dev.txt

Testing

python -m pytest --cov=lyrics tests/

lyrics-generator's People

Contributors

dependabot[bot] avatar dlebech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lyrics-generator's Issues

KeyError: '\n' when running on Windows

Traceback (most recent call last):
File "C:\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Ryan Schatz\Desktop\homework\ai git\lyrics-generator-master\lyrics\train.py", line 483, in
train(
File "C:\Users\Ryan Schatz\Desktop\homework\ai git\lyrics-generator-master\lyrics\train.py", line 269, in train
X, y, seq_length, num_words, tokenizer = prepare_data(
File "C:\Users\Ryan Schatz\Desktop\homework\ai git\lyrics-generator-master\lyrics\train.py", line 77, in prepare_data
newline_int = tokenizer.word_index["\n"]
KeyError: '\n'

I keep getting this error no matter what dataset I run. I even tried the sample command and it returned this error.

Song data file no longer available

Kaggle seems to have removed or relocated the song data file, and I was unable to find a suitable replacement. Perhaps describing the format in order to allow custom song data would help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.