polakowo / gpt2bot Goto Github PK

View Code? Open in Web Editor NEW

425.0 425.0 104.0 5.77 MB

Your new Telegram buddy powered by transformers

License: MIT License

Python 46.13% Jupyter Notebook 53.87%

ai-chatbot bot chatbot dialogpt gpt-2 machine-learning telegram-bot transformers

gpt2bot's Introduction

gpt2bot's People

Contributors

Stargazers

Watchers

Forkers

jsedoc tejamoy rosssong entn-at vaibhavpoddar cosmonautd cerebroai vibster twochill greko6 thundree chikiuso subatiq n8wacht christoph-meyer sunilsivadas cristianmtr majing2019 torshie sahanduiuc perfmjs frivas-at-navteca aleksandrpanasiuk margarey daabaa jgoodrich77 staccdotsol raylin01 scape1989 svirmi strangetcy humanbios owowobread the-lazy-val danilogr sahilsingh2402 teletonn imuledx darkhacker12121 annihilatorrrr hiddenvs true-junglist pondhanush764 reejit aobaruwa vaibhavchandra12 itzderock snussik byeongal nanidachaman cephei-technologies vishxl ah3653070 sashiim20 aurelius-ai darkskull777 ravinduranasinghe abubeegaran ronioncloud therealarnold zstarpak archrootboot onuratakan supergerrit mcdv7 my-shi aqualxx lividarchives owncrafts gandalfthehacker mukul116 oflenake sudoanirudh annubbhav rifat951 shaikshabeena5e7 nawasnaziru arfanhayder limitlessmatrix minnaingtun33 grupomoura touka10969 lespet satyamkill11 sukicai kretikys iamchiragr joskid carlotaortizml alwanof devdev22plus lkn4wrk carloshalviarez1995 mahmud6390 iparsw ayaz345 shivanshu0309 yusonchendy micromax killer-shark001

gpt2bot's Issues

Transformers version

This is not an issues as such, it's a question: is there any particular reason for using transformers==2.3.0? At the time of the writing, the latest transformers library version was 3.0.2.
Does it introduce any bug, incompatibilities or anything else that is known to stop the scripts from working?

error when using dstc dataset in config

Is this supported?

2020-01-21 16:45:01,189 - model - INFO - Downloading model files to medium_dstc_ft...
100% 293/293 [00:00<00:00, 242727.84B/s]
100% 1042301/1042301 [00:01<00:00, 778001.76B/s]
100% 456318/456318 [00:00<00:00, 524076.53B/s]
100% 351265269/351265269 [00:47<00:00, 7401148.29B/s]
2020-01-21 16:45:53,837 - model - INFO - Loading model from medium_dstc_ft...
Traceback (most recent call last):
  File "interactive_bot.py", line 98, in <module>
    main()
  File "interactive_bot.py", line 83, in main
    model, tokenizer = load_model(target_folder_name, config)
  File "/content/model.py", line 146, in load_model
    model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
	Missing key(s) in state_dict: "transformer.h.12.ln_1.weight", "transformer.h.12.ln_1.bias", "transformer.h.12.attn.bias", "transformer.h.12.attn.c_attn.weight", "transformer.h.12.attn.c_attn.bias", "transformer.h.12.attn.c_proj.weight", "transformer.h.12.attn.c_proj.bias", "transformer.h.12.ln_2.weight", "transformer.h.12.ln_2.bias", "transformer.h.12.mlp.c_fc.weight", "transformer.h.12.mlp.c_fc.bias", "transformer.h.12.mlp.c_proj.weight", "transformer.h.12.mlp.c_proj.bias", "transformer.h.13.ln_1.weight", "transformer.h.13.ln_1.bias", "transformer.h.13.attn.bias", "transformer.h.13.attn.c_attn.weight", "transformer.h.13.attn.c_attn.bias", "transformer.h.13.attn.c_proj.weight", "transformer.h.13.attn.c_proj.bias", "transformer.h.13.ln_2.weight", "transformer.h.13.ln_2.bias", "transformer.h.13.mlp.c_fc.weight", "transformer.h.13.mlp.c_fc.bias", "transformer.h.13.mlp.c_proj.weight", "transformer.h.13.mlp.c_proj.bias", "transformer.h.14.ln_1.weight", "transformer.h.14.ln_1.bias", "transformer.h.14.attn.bias", "transformer.h.14.attn.c_attn.weight", "transformer.h.14.attn.c_attn.bias", "transformer.h.14.attn.c_proj.weight", "transformer.h.14.attn.c_proj.bias", "transformer.h.14.ln_2.weight", "transformer.h.14.ln_2.bias", "transformer.h.14.mlp.c_fc.weight", "transformer.h.14.mlp.c_fc.bias", "transformer.h.14.mlp.c_proj.weight", "transformer.h.14.mlp.c_proj.bias", "transformer.h.15.ln_1.weight", "transformer.h.15.ln_1.bias", "transformer.h.15.attn.bias", "transformer.h.15.attn.c_attn.weight", "transformer.h.15.attn.c_attn.bias", "transformer.h.15.attn.c_proj.weight", "transformer.h.15.attn.c_proj.bias", "transformer.h.15.ln_2.weight", "transformer.h.15.ln_2.bias", "transformer.h.15.mlp.c_fc.weight", "transformer.h.15.mlp.c_fc.bias", "transformer.h.15.mlp.c_proj.weight", "transformer.h.15.mlp.c_proj.bias", "transformer.h.16.ln_1.weight", "transformer.h.16.ln_1.bias", "transformer.h.16.attn.bias", "transformer.h.16.attn.c_attn.weight", "transformer.h.16.attn.c_attn.bias", "transformer.h.16.attn.c_proj.weight", "transformer.h.16.attn.c_proj.bias", "transformer.h.16.ln_2.weight", "transformer.h.16.ln_2.bias", "transformer.h.16.mlp.c_fc.weight", "transformer.h.16.mlp.c_fc.bias", "transformer.h.16.mlp.c_proj.weight", "transformer.h.16.mlp.c_proj.bias", "transformer.h.17.ln_1.weight", "transformer.h.17.ln_1.bias", "transformer.h.17.attn.bias", "transformer.h.17.attn.c_attn.weight", "transformer.h.17.attn.c_attn.bias", "transformer.h.17.attn.c_proj.weight", "transformer.h.17.attn.c_proj.bias", "transformer.h.17.ln_2.weight", "transformer.h.17.ln_2.bias", "transformer.h.17.mlp.c_fc.weight", "transformer.h.17.mlp.c_fc.bias", "transformer.h.17.mlp.c_proj.weight", "transformer.h.17.mlp.c_proj.bias", "transformer.h.18.ln_1.weight", "transformer.h.18.ln_1.bias", "transformer.h.18.attn.bias", "transformer.h.18.attn.c_attn.weight", "transformer.h.18.attn.c_attn.bias", "transformer.h.18.attn.c_proj.weight", "transformer.h.18.attn.c_proj.bias", "transformer.h.18.ln_2.weight", "transformer.h.18.ln_2.bias", "transformer.h.18.mlp.c_fc.weight", "transformer.h.18.mlp.c_fc.bias", "transformer.h.18.mlp.c_proj.weight", "transformer.h.18.mlp.c_proj.bias", "transformer.h.19.ln_1.weight", "transformer.h.19.ln_1.bias", "transformer.h.19.attn.bias", "transformer.h.19.attn.c_attn.weight", "transformer.h.19.attn.c_attn.bias", "transformer.h.19.attn.c_proj.weight", "transformer.h.19.attn.c_proj.bias", "transformer.h.19.ln_2.weight", "transformer.h.19.ln_2.bias", "transformer.h.19.mlp.c_fc.weight", "transformer.h.19.mlp.c_fc.bias", "transformer.h.19.mlp.c_proj.weight", "transformer.h.19.mlp.c_proj.bias", "transformer.h.20.ln_1.weight", "transformer.h.20.ln_1.bias", "transformer.h.20.attn.bias", "transformer.h.20.attn.c_attn.weight", "transformer.h.20.attn.c_attn.bias", "transformer.h.20.attn.c_proj.weight", "transformer.h.20.attn.c_proj.bias", "transformer.h.20.ln_2.weight", "transformer.h.20.ln_2.bias", "transformer.h.20.mlp.c_fc.weight", "transformer.h.20.mlp.c_fc.bias", "transformer.h.20.mlp.c_proj.weight", "transformer.h.20.mlp.c_proj.bias", "transformer.h.21.ln_1.weight", "transformer.h.21.ln_1.bias", "transformer.h.21.attn.bias", "transformer.h.21.attn.c_attn.weight", "transformer.h.21.attn.c_attn.bias", "transformer.h.21.attn.c_proj.weight", "transformer.h.21.attn.c_proj.bias", "transformer.h.21.ln_2.weight", "transformer.h.21.ln_2.bias", "transformer.h.21.mlp.c_fc.weight", "transformer.h.21.mlp.c_fc.bias", "transformer.h.21.mlp.c_proj.weight", "transformer.h.21.mlp.c_proj.bias", "transformer.h.22.ln_1.weight", "transformer.h.22.ln_1.bias", "transformer.h.22.attn.bias", "transformer.h.22.attn.c_attn.weight", "transformer.h.22.attn.c_attn.bias", "transformer.h.22.attn.c_proj.weight", "transformer.h.22.attn.c_proj.bias", "transformer.h.22.ln_2.weight", "transformer.h.22.ln_2.bias", "transformer.h.22.mlp.c_fc.weight", "transformer.h.22.mlp.c_fc.bias", "transformer.h.22.mlp.c_proj.weight", "transformer.h.22.mlp.c_proj.bias", "transformer.h.23.ln_1.weight", "transformer.h.23.ln_1.bias", "transformer.h.23.attn.bias", "transformer.h.23.attn.c_attn.weight", "transformer.h.23.attn.c_attn.bias", "transformer.h.23.attn.c_proj.weight", "transformer.h.23.attn.c_proj.bias", "transformer.h.23.ln_2.weight", "transformer.h.23.ln_2.bias", "transformer.h.23.mlp.c_fc.weight", "transformer.h.23.mlp.c_fc.bias", "transformer.h.23.mlp.c_proj.weight", "transformer.h.23.mlp.c_proj.bias". 
	size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]).
	size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.0.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.0.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.0.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.0.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.1.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.1.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.1.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.1.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.2.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.2.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.2.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.2.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.2.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.2.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.3.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.3.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.3.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.3.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.3.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.3.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.4.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.4.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.4.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.4.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.4.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.4.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.5.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.5.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.5.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.5.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.5.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.5.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.6.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.6.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.6.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.6.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.6.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.6.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.7.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.7.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.7.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.7.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.7.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.7.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.8.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.8.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.8.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.8.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.8.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.8.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.9.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.9.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.9.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.9.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.9.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.9.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.10.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.10.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.10.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.10.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.10.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.10.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.11.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.11.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.11.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.11.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.11.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.11.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.ln_f.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for lm_head.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]).

Nvidia 3090 +large*-gpu.cfg

Hello,

I am trying to load the model "microsoft/DialoGPT-large" with the GPU nvidia 3090 but I receive this msg all the time:

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 24.00 GiB total capacity; 2.07 GiB already allocated; 0 bytes free; 2.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC
_CONF

I am trying to fix this error but I can't. Really I dont understand why if i have 24 GiB capacity, it only reserve 2.07 GiB for PyTorch.

Could you help me please?

PD: @MagiCsito in telegram.

Update:
Monitoring resources, it go out of memory when it load "Human_vs_machine_wight" model (even with DialoGPT-small). Is there a way to can made this works with a nvidia 3090 (24 gb VRAM) and 64 RAM?

BTW, in colab works with 16 gb VRAM Graphic Card.

Thank you

Response Quality control

Hi,

This is one of the best implementations of a gpt2 bot so far. I tried many, but this is the fastest of all.

I'm currently testing the medium-cpu model for a "Quick response" generation for chat application. It works, but the responses seem to be reddit-inspired. They contain slang language like 'gib' instead of give etc. Can get violent at times.
How can I make it "professional". So that it only gives good responses to be used in business/professional applications?

Thanks!!

Traning on own dataset

How can I train this model with my own dataset?

It is not running in the console

When I am typing this

python run_bot.py --type=console and getting
Parsing the config... Loading the pipeline 'microsoft/DialoGPT-medium'... Running the console bot... Bot: Just start texting me. If I'm getting annoying, type "/reset". To quit the chat, press Ctrl-C. User:
I don't know what should I put in there... then I put random user
then I am getting
....\gpt2bot\venv\lib\site-packages\transformers\generation\utils.py", line 2484, in sample
raise ValueError("If eos_token_id is defined, make sure that pad_token_id is defined.")
ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

I have put my telegram and giphy token id in my_chatbot.cfg file
How can I resolve this problem

Learning mode

Hi,

Hope you are all well !

I was wondering if it is possible to add a learning mode to gpt2bot too ?

I found this example but the code is not really as neat as yours ^^.

https://github.com/Existencce/GPT2-Telegram-Chatbot#for-quick-command-reference

Thanks for any inputs or insights on that that question.

Cheers,
X

User session in Telegram supported ?

Hi, cool work.
Does it support a session for every user that talk to bot in Telegram?

This is vital, because we should store conversation history for every user separated. I inspect code but did not found these

Run only on console

Instead of putting this to telegram, can I run it on bash myself?

How to get gpt2bot reply message the exact length?

Hi, may I ask, how to get gpt2bot reply message the exact length? such as I could set the bot reply message length is between 55 to 60 characters length? thanks.

Flask/Gunicorn server api

Hi,

Hope you are all well !

I would like to setup a flask server with a query endpoint for processing a message.
Is it possible to push an example of gpt2bot with a flask server or gunicorn ?

In fact, I am writing a multi bot project (written in Golang) and I d like to aggregate response of several type of chatbots with an agent. The projet reference is there: https://github.com/paper2code/telegram-multibot

As I am much more a gopher than a pythonista, I am requesting your kind help on that. ^^

Can you help me on that ?

Cheers,
X

Does this work on a jetson nano?

The jetson nano is arm A57 which worries me as I've had but luck with libraries supporting arm (especially TensorFlow and PyTorch). Is it compatible? Is cuda supported and what kind of speeds would be expected with the default config but with turns_memory = -1 and num_samples = 5

RuntimeError: CUDA error: device-side assert triggered

I've modified the source code of interactive_bot.py and stripped it down significantly. the only thing left is a function that takes in a single string (initial message) and then it returns a response of the message without any history to go off of. the class arguments you see passed into generate_response are the same normal args that would be passed in in the original script.

def response(self, initial):
    history = initial + self.tokenizer.eos_token
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
    return random.choice(possibilities)

I'm running this function on every comment from a comments section on a meme web app just to see what kind of responses it would make. the function is ran maybe 1 to 2 times a second. usually around 50 executions of the function in called and a crash occurs. the traceback is quite long and usually its different too.

...
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [926,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [927,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
Traceback (most recent call last):
  File "D:/Documents/Python Stuff/NAPB project/scripts/comment_spooker/comment_spooker.py", line 47, in <module>
    res = sentience.response(comment)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\gen_single_response.py", line 27, in response
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 82, in generate_response
    out = sample_sequence(model, context_tokens, config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 60, in sample_sequence
    outputs = model(**inputs)  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 549, in forward
    inputs_embeds=inputs_embeds)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 460, in forward
    head_mask=head_mask[i])
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 232, in forward
    head_mask=head_mask)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 191, in forward
    present = torch.stack((key.transpose(-2, -1), value))  # transpose to have same shapes for stacking
RuntimeError: CUDA error: device-side assert triggered
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [800,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [801,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [802,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [803,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [804,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
...

another traceback ive seen is

THCudaCheck FAIL file=C:/w/1/s/windows/pytorch/aten/src\ATen/native/cuda/Normalization.cuh line=586 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "D:/Documents/Python Stuff/NAPB project/scripts/comment_spooker/comment_spooker.py", line 47, in <module>
    res = sentience.response(comment)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\gen_single_response.py", line 27, in response
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 82, in generate_response
    out = sample_sequence(model, context_tokens, config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 60, in sample_sequence
    outputs = model(**inputs)  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 549, in forward
    inputs_embeds=inputs_embeds)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 460, in forward
    head_mask=head_mask[i])
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 229, in forward
    output_attn = self.attn(self.ln_1(x),
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\normalization.py", line 152, in forward
    input, self.normalized_shape, self.weight, self.bias, self.eps)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 1682, in layer_norm
    torch.backends.cudnn.enabled)
RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/windows/pytorch/aten/src\ATen/native/cuda/Normalization.cuh:586

restarting the program works right back up immidiatly however if i try to make a try except to skip over that comment and go to the next one the error still raises, same applys for trying again, and even loading the model and tokenizer again. if i try loading them again it raises an exception immediatly. i have an rtx 2080 super and a ryzen 9 3900x if that helps.

Can you tell on which dataset is it trained upon?

And is there any way possible that it will keep learning from the dialogues given by the user ?

Also is it possible to make the bot have it's own identity like:
User - Hey whats your name ?
Bot - Heyaaa! I am jessy

large model release, would you update it?

hi I see the large model of gpt2 and the pretrained dialogpt based on it is released, would you consider include in this project? thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.