Code Monkey home page Code Monkey logo

gpt2bot's Introduction

gpt2bot's People

Contributors

polakowo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt2bot's Issues

Transformers version

This is not an issues as such, it's a question: is there any particular reason for using transformers==2.3.0? At the time of the writing, the latest transformers library version was 3.0.2.
Does it introduce any bug, incompatibilities or anything else that is known to stop the scripts from working?

error when using dstc dataset in config

Is this supported?

2020-01-21 16:45:01,189 - model - INFO - Downloading model files to medium_dstc_ft...
100% 293/293 [00:00<00:00, 242727.84B/s]
100% 1042301/1042301 [00:01<00:00, 778001.76B/s]
100% 456318/456318 [00:00<00:00, 524076.53B/s]
100% 351265269/351265269 [00:47<00:00, 7401148.29B/s]
2020-01-21 16:45:53,837 - model - INFO - Loading model from medium_dstc_ft...
Traceback (most recent call last):
  File "interactive_bot.py", line 98, in <module>
    main()
  File "interactive_bot.py", line 83, in main
    model, tokenizer = load_model(target_folder_name, config)
  File "/content/model.py", line 146, in load_model
    model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
	Missing key(s) in state_dict: "transformer.h.12.ln_1.weight", "transformer.h.12.ln_1.bias", "transformer.h.12.attn.bias", "transformer.h.12.attn.c_attn.weight", "transformer.h.12.attn.c_attn.bias", "transformer.h.12.attn.c_proj.weight", "transformer.h.12.attn.c_proj.bias", "transformer.h.12.ln_2.weight", "transformer.h.12.ln_2.bias", "transformer.h.12.mlp.c_fc.weight", "transformer.h.12.mlp.c_fc.bias", "transformer.h.12.mlp.c_proj.weight", "transformer.h.12.mlp.c_proj.bias", "transformer.h.13.ln_1.weight", "transformer.h.13.ln_1.bias", "transformer.h.13.attn.bias", "transformer.h.13.attn.c_attn.weight", "transformer.h.13.attn.c_attn.bias", "transformer.h.13.attn.c_proj.weight", "transformer.h.13.attn.c_proj.bias", "transformer.h.13.ln_2.weight", "transformer.h.13.ln_2.bias", "transformer.h.13.mlp.c_fc.weight", "transformer.h.13.mlp.c_fc.bias", "transformer.h.13.mlp.c_proj.weight", "transformer.h.13.mlp.c_proj.bias", "transformer.h.14.ln_1.weight", "transformer.h.14.ln_1.bias", "transformer.h.14.attn.bias", "transformer.h.14.attn.c_attn.weight", "transformer.h.14.attn.c_attn.bias", "transformer.h.14.attn.c_proj.weight", "transformer.h.14.attn.c_proj.bias", "transformer.h.14.ln_2.weight", "transformer.h.14.ln_2.bias", "transformer.h.14.mlp.c_fc.weight", "transformer.h.14.mlp.c_fc.bias", "transformer.h.14.mlp.c_proj.weight", "transformer.h.14.mlp.c_proj.bias", "transformer.h.15.ln_1.weight", "transformer.h.15.ln_1.bias", "transformer.h.15.attn.bias", "transformer.h.15.attn.c_attn.weight", "transformer.h.15.attn.c_attn.bias", "transformer.h.15.attn.c_proj.weight", "transformer.h.15.attn.c_proj.bias", "transformer.h.15.ln_2.weight", "transformer.h.15.ln_2.bias", "transformer.h.15.mlp.c_fc.weight", "transformer.h.15.mlp.c_fc.bias", "transformer.h.15.mlp.c_proj.weight", "transformer.h.15.mlp.c_proj.bias", "transformer.h.16.ln_1.weight", "transformer.h.16.ln_1.bias", "transformer.h.16.attn.bias", "transformer.h.16.attn.c_attn.weight", "transformer.h.16.attn.c_attn.bias", "transformer.h.16.attn.c_proj.weight", "transformer.h.16.attn.c_proj.bias", "transformer.h.16.ln_2.weight", "transformer.h.16.ln_2.bias", "transformer.h.16.mlp.c_fc.weight", "transformer.h.16.mlp.c_fc.bias", "transformer.h.16.mlp.c_proj.weight", "transformer.h.16.mlp.c_proj.bias", "transformer.h.17.ln_1.weight", "transformer.h.17.ln_1.bias", "transformer.h.17.attn.bias", "transformer.h.17.attn.c_attn.weight", "transformer.h.17.attn.c_attn.bias", "transformer.h.17.attn.c_proj.weight", "transformer.h.17.attn.c_proj.bias", "transformer.h.17.ln_2.weight", "transformer.h.17.ln_2.bias", "transformer.h.17.mlp.c_fc.weight", "transformer.h.17.mlp.c_fc.bias", "transformer.h.17.mlp.c_proj.weight", "transformer.h.17.mlp.c_proj.bias", "transformer.h.18.ln_1.weight", "transformer.h.18.ln_1.bias", "transformer.h.18.attn.bias", "transformer.h.18.attn.c_attn.weight", "transformer.h.18.attn.c_attn.bias", "transformer.h.18.attn.c_proj.weight", "transformer.h.18.attn.c_proj.bias", "transformer.h.18.ln_2.weight", "transformer.h.18.ln_2.bias", "transformer.h.18.mlp.c_fc.weight", "transformer.h.18.mlp.c_fc.bias", "transformer.h.18.mlp.c_proj.weight", "transformer.h.18.mlp.c_proj.bias", "transformer.h.19.ln_1.weight", "transformer.h.19.ln_1.bias", "transformer.h.19.attn.bias", "transformer.h.19.attn.c_attn.weight", "transformer.h.19.attn.c_attn.bias", "transformer.h.19.attn.c_proj.weight", "transformer.h.19.attn.c_proj.bias", "transformer.h.19.ln_2.weight", "transformer.h.19.ln_2.bias", "transformer.h.19.mlp.c_fc.weight", "transformer.h.19.mlp.c_fc.bias", "transformer.h.19.mlp.c_proj.weight", "transformer.h.19.mlp.c_proj.bias", "transformer.h.20.ln_1.weight", "transformer.h.20.ln_1.bias", "transformer.h.20.attn.bias", "transformer.h.20.attn.c_attn.weight", "transformer.h.20.attn.c_attn.bias", "transformer.h.20.attn.c_proj.weight", "transformer.h.20.attn.c_proj.bias", "transformer.h.20.ln_2.weight", "transformer.h.20.ln_2.bias", "transformer.h.20.mlp.c_fc.weight", "transformer.h.20.mlp.c_fc.bias", "transformer.h.20.mlp.c_proj.weight", "transformer.h.20.mlp.c_proj.bias", "transformer.h.21.ln_1.weight", "transformer.h.21.ln_1.bias", "transformer.h.21.attn.bias", "transformer.h.21.attn.c_attn.weight", "transformer.h.21.attn.c_attn.bias", "transformer.h.21.attn.c_proj.weight", "transformer.h.21.attn.c_proj.bias", "transformer.h.21.ln_2.weight", "transformer.h.21.ln_2.bias", "transformer.h.21.mlp.c_fc.weight", "transformer.h.21.mlp.c_fc.bias", "transformer.h.21.mlp.c_proj.weight", "transformer.h.21.mlp.c_proj.bias", "transformer.h.22.ln_1.weight", "transformer.h.22.ln_1.bias", "transformer.h.22.attn.bias", "transformer.h.22.attn.c_attn.weight", "transformer.h.22.attn.c_attn.bias", "transformer.h.22.attn.c_proj.weight", "transformer.h.22.attn.c_proj.bias", "transformer.h.22.ln_2.weight", "transformer.h.22.ln_2.bias", "transformer.h.22.mlp.c_fc.weight", "transformer.h.22.mlp.c_fc.bias", "transformer.h.22.mlp.c_proj.weight", "transformer.h.22.mlp.c_proj.bias", "transformer.h.23.ln_1.weight", "transformer.h.23.ln_1.bias", "transformer.h.23.attn.bias", "transformer.h.23.attn.c_attn.weight", "transformer.h.23.attn.c_attn.bias", "transformer.h.23.attn.c_proj.weight", "transformer.h.23.attn.c_proj.bias", "transformer.h.23.ln_2.weight", "transformer.h.23.ln_2.bias", "transformer.h.23.mlp.c_fc.weight", "transformer.h.23.mlp.c_fc.bias", "transformer.h.23.mlp.c_proj.weight", "transformer.h.23.mlp.c_proj.bias". 
	size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]).
	size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.0.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.0.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.0.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.0.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.1.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.1.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.1.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.1.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.1.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.2.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.2.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.2.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.2.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.2.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.2.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.2.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.3.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.3.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.3.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.3.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.3.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.3.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.3.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.4.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.4.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.4.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.4.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.4.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.4.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.4.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.5.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.5.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.5.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.5.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.5.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.5.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.5.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.6.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.6.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.6.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.6.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.6.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.6.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.6.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.7.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.7.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.7.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.7.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.7.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.7.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.7.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.8.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.8.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.8.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.8.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.8.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.8.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.8.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.9.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.9.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.9.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.9.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.9.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.9.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.9.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.10.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.10.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.10.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.10.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.10.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.10.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.10.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for transformer.h.11.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for transformer.h.11.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for transformer.h.11.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.h.11.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for transformer.h.11.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for transformer.h.11.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for transformer.h.11.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for transformer.ln_f.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for lm_head.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]).

Nvidia 3090 +large*-gpu.cfg

Hello,

I am trying to load the model "microsoft/DialoGPT-large" with the GPU nvidia 3090 but I receive this msg all the time:

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 24.00 GiB total capacity; 2.07 GiB already allocated; 0 bytes free; 2.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC
_CONF

I am trying to fix this error but I can't. Really I dont understand why if i have 24 GiB capacity, it only reserve 2.07 GiB for PyTorch.

Could you help me please?

PD: @MagiCsito in telegram.

Update:
Monitoring resources, it go out of memory when it load "Human_vs_machine_wight" model (even with DialoGPT-small). Is there a way to can made this works with a nvidia 3090 (24 gb VRAM) and 64 RAM?

BTW, in colab works with 16 gb VRAM Graphic Card.

Thank you

Response Quality control

Hi,

This is one of the best implementations of a gpt2 bot so far. I tried many, but this is the fastest of all.

I'm currently testing the medium-cpu model for a "Quick response" generation for chat application. It works, but the responses seem to be reddit-inspired. They contain slang language like 'gib' instead of give etc. Can get violent at times.
How can I make it "professional". So that it only gives good responses to be used in business/professional applications?

Thanks!!

It is not running in the console

When I am typing this

python run_bot.py --type=console and getting
Parsing the config... Loading the pipeline 'microsoft/DialoGPT-medium'... Running the console bot... Bot: Just start texting me. If I'm getting annoying, type "/reset". To quit the chat, press Ctrl-C. User:
I don't know what should I put in there... then I put random user
then I am getting
....\gpt2bot\venv\lib\site-packages\transformers\generation\utils.py", line 2484, in sample
raise ValueError("If eos_token_id is defined, make sure that pad_token_id is defined.")
ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

I have put my telegram and giphy token id in my_chatbot.cfg file
How can I resolve this problem

User session in Telegram supported ?

Hi, cool work.
Does it support a session for every user that talk to bot in Telegram?

This is vital, because we should store conversation history for every user separated. I inspect code but did not found these

Flask/Gunicorn server api

Hi,

Hope you are all well !

I would like to setup a flask server with a query endpoint for processing a message.
Is it possible to push an example of gpt2bot with a flask server or gunicorn ?

In fact, I am writing a multi bot project (written in Golang) and I d like to aggregate response of several type of chatbots with an agent. The projet reference is there: https://github.com/paper2code/telegram-multibot

As I am much more a gopher than a pythonista, I am requesting your kind help on that. ^^

Can you help me on that ?

Cheers,
X

Does this work on a jetson nano?

The jetson nano is arm A57 which worries me as I've had but luck with libraries supporting arm (especially TensorFlow and PyTorch). Is it compatible? Is cuda supported and what kind of speeds would be expected with the default config but with turns_memory = -1 and num_samples = 5

RuntimeError: CUDA error: device-side assert triggered

I've modified the source code of interactive_bot.py and stripped it down significantly. the only thing left is a function that takes in a single string (initial message) and then it returns a response of the message without any history to go off of. the class arguments you see passed into generate_response are the same normal args that would be passed in in the original script.

def response(self, initial):
    history = initial + self.tokenizer.eos_token
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
    return random.choice(possibilities)

I'm running this function on every comment from a comments section on a meme web app just to see what kind of responses it would make. the function is ran maybe 1 to 2 times a second. usually around 50 executions of the function in called and a crash occurs. the traceback is quite long and usually its different too.

...
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [926,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [927,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
Traceback (most recent call last):
  File "D:/Documents/Python Stuff/NAPB project/scripts/comment_spooker/comment_spooker.py", line 47, in <module>
    res = sentience.response(comment)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\gen_single_response.py", line 27, in response
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 82, in generate_response
    out = sample_sequence(model, context_tokens, config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 60, in sample_sequence
    outputs = model(**inputs)  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 549, in forward
    inputs_embeds=inputs_embeds)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 460, in forward
    head_mask=head_mask[i])
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 232, in forward
    head_mask=head_mask)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 191, in forward
    present = torch.stack((key.transpose(-2, -1), value))  # transpose to have same shapes for stacking
RuntimeError: CUDA error: device-side assert triggered
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [800,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [801,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [802,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [803,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
C:/w/1/s/windows/pytorch/aten/src\THC/THCTensorRandom.cuh:166: block: [2,0,0], thread: [804,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
...

another traceback ive seen is

THCudaCheck FAIL file=C:/w/1/s/windows/pytorch/aten/src\ATen/native/cuda/Normalization.cuh line=586 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "D:/Documents/Python Stuff/NAPB project/scripts/comment_spooker/comment_spooker.py", line 47, in <module>
    res = sentience.response(comment)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\gen_single_response.py", line 27, in response
    possibilities = generate_response(self.model, self.tokenizer, history, self.config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 82, in generate_response
    out = sample_sequence(model, context_tokens, config)
  File "D:\Documents\Python Stuff\NAPB project\scripts\comment_spooker\decoder.py", line 60, in sample_sequence
    outputs = model(**inputs)  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 549, in forward
    inputs_embeds=inputs_embeds)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 460, in forward
    head_mask=head_mask[i])
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\modeling_gpt2.py", line 229, in forward
    output_attn = self.attn(self.ln_1(x),
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\modules\normalization.py", line 152, in forward
    input, self.normalized_shape, self.weight, self.bias, self.eps)
  File "D:\Program Files\Python37\lib\site-packages\torch\nn\functional.py", line 1682, in layer_norm
    torch.backends.cudnn.enabled)
RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/windows/pytorch/aten/src\ATen/native/cuda/Normalization.cuh:586

restarting the program works right back up immidiatly however if i try to make a try except to skip over that comment and go to the next one the error still raises, same applys for trying again, and even loading the model and tokenizer again. if i try loading them again it raises an exception immediatly. i have an rtx 2080 super and a ryzen 9 3900x if that helps.

Can you tell on which dataset is it trained upon?

And is there any way possible that it will keep learning from the dialogues given by the user ?

Also is it possible to make the bot have it's own identity like:
User - Hey whats your name ?
Bot - Heyaaa! I am jessy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.