Code Monkey home page Code Monkey logo

thuclgpt's Introduction

thuclGPT

A simple, fast repository for training/finetuning medium-sized GPTs. It is a fork of nanoGPT, following a graduate course plan of Computing Language in Tsinghua University.

Because the code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints (e.g. biggest one currently available as a starting point would be the GPT-2 1.3B model from OpenAI).

install

pip install torch numpy transformers datasets tiktoken wandb tqdm sentencepiece gradio

or

pip install -r requirements.txt

quick start

The project mainly contains four training/finetuning models, listed as follows.

model_id description
out-lyric-char character-level chinese lyric GPT
out-lyric-word word-level(bpe) chinese lyric GPT
out-lyric-ft character-level chinese lyric GPT finetuned from uer/gpt2-xlarge-chinese-cluecorpussmall
out-lyric-ft-word word-level(bpe) embedding-extended chinese lyric GPT finetuned from uer/gpt2-xlarge-chinese-cluecorpussmall

data preparation

Dataset is created from a chinese lyric dataset, described as follows.

dataset length
train 34M
valid 864K
test 180K

For data preparation, simply run the prepare.py in data folder. For example, for out-lyric-char, just run

python data/lyric_char/prepare.py

and the data bins will be ready.

train/finetune

For training/finetuning, simply run train.py with the corresponding configuration python script in the config folder. For example, for out-lyric-char, just run

python train.py config/train_lyric_char.py

and feel free to customize parameters according to needs and gpu specs.

complete/calculate perplexity

For completion, simply run completion.py with the corresponding foler containing the checkpoints. For example, for out-lyric-char, just run

python completion.py --out_dir=out-lyric-char

and feel free to customize prompt.txt.

For perplexity calculation, replace the code above with calculate_ppl.py and the mean and std of the test set will displayed.

web demo

For web demo, simply run python web_demo.py and check the browser. The simple frontend powered by Gradio provides functions to choose models and chat with them.

Results

model loss val_loss test_ppl VRAM
out-lyric-char 1.60 1.92 7.71 17G
out-lyric-word 3.08 4.10 82.29 17G
out-lyric-ft 2.04 1.99 9.59 37G
out-lyric-ft-word 3.35 3.68 10228.98 37G

acknowledgements

All thuclGPT experiments are powered by GPUs on AutoDL.

thuclgpt's People

Contributors

akashmjn avatar ankandrew avatar apivovarov avatar cchan avatar christianorr avatar ctjlewis avatar danielgross avatar drisspg avatar gnobre avatar ho2103 avatar johnwildauer avatar jorahn avatar karpathy avatar kovkev avatar laihoe avatar lantiga avatar lutzroeder avatar micropanda123 avatar nat avatar nynyg avatar okuvshynov avatar otaviogood avatar pwhiddy avatar python273 avatar ramtingh avatar ryouze avatar snehalraj avatar venusatuluri avatar yassineyousfi avatar ymurenko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.