Code Monkey home page Code Monkey logo

rl-chatbot's Introduction

Open Source Love

Intro

This is a chatbot trained by seq2seq and reinforcement learning.

  • seq2seq

Seq2seq is a classical model for structured learning, its input and output are both sequence

The vanilla seq2seq model is described in a NIPS '14 paper Sequence to Sequence Learning with Neural Networks, the encoder and the decoder are seperated

The seq2seq model in this repository is constructed with 2 LSTMs, similar to the one described in an ICCV '15 paper Sequence to Sequence -- Video to Text, the encoder and the decoder share same weights

  • RL

After training chatbot with enough epochs, I use a RL technique called policy gradient to further improve the chatbot

By doing this, the chatbot can generate more interesting response with regard to the reward function

My reward function is similar to the one described in an EMNLP '16 paper Deep Reinforcement Learning for Dialogue Generation

Chatbot results

Let me show you some chatbot's results:

  • Result 1 (A & B are both a trained chatbot)

A: Can you do me a favor?

B: Anything.

A: I want you to bend a father.

B: Of course.

A: I'm so sorry I didn't mean to be rude but.

B: Don't get it at least it's about as much as I can.

  • Result 2 (A & B are both a trained chatbot)

A: I didn't do it! I haven't got a gun!

B: You liar. You still want to get your money back.

A: Something wrong with the truth.

B: I don't want any money just a little I can't take just out of this.

A: What's wrong.

B: Listen to me. I'm scared now.

A: Ah you'll need all the money I need your help.

  • Result 3 (A & B are both a trained chatbot)

A: I forgot to get the Coca-Cola.

B: I got something stuck in the head.

A: It's all right I guess it's pretty common in the lot of shit.

B: I mean we've been all this together since the kid.

A: All the more reason.

B: It's not him it's his fault and he's blind because of god.

A: And is a strange story.

you can find more results from chatbot in this directory

neural networks' configs of those results are described in the filename

Simulate dialogs by pre-trained model

Are you amazed at the result of the chatbot? :D

You can generate those amazing dialogs by your own!

Just follow the instructions below:

pip install -r requirements.txt
./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists
So make sure there's no break during the download)
./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <PATH TO MODEL>

to generate seq2seq dialog, type "model/Seq2Seq/model-77"

to generate RL dialog, type "model/RL/model-56-3000"

  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

  • <INPUT FILE>

Take a look at result/sample_input_new.txt

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

Generate responses by pre-trained model

If you want chatbot to generate only a single response for each question

Follow the instructions below:

pip install -r requirements.txt
./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists. So make sure there's no break during the download)
./script/run.sh <TYPE> <INPUT FILE> <OUTPUT FILE>
  • <TYPE>

to generate seq2seq response, type "S2S"

to generate reinforcement learning response, type "RL"

  • <INPUT FILE>

Take a look at result/sample_input_new.txt

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

Train chatbot from scratch

I trained my chatbot with python2.7.

If you want to train the chatbot from scratch

You can follow those instructions below:

Step0: training configs

Take a look at python/config.py, all configs for training is described here.

You can change some training hyper-parameters, or just keep the original ones.

Step1: download data & libraries

I use Cornell Movie-Dialogs Corpus

You need to download it, unzip it, and move all *.txt files into data/ directory

Then download some libraries with pip:

pip install -r requirements.txt

Step2: parse data

(in this step I use python3)
./script/parse.sh

Step3: train a Seq2Seq model

./script/train.sh

Step4-1: test a Seq2Seq model

Let's show some results of seq2seq model :)

./script/test.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step4-2: simulate a dialog

And show some dialog results from seq2seq model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot will only considers user's utterance

if you choose 2, chatbot will considers user's utterance and chatbot's last utterance

Step5: train a RL model

you need to change the training_type parameter in python/config.py

'normal' for seq2seq training, 'pg' for policy gradient

you need to first train with 'normal' for some epochs till stable (at least 30 epoches is highly recommended)

then change the method to 'pg' to optimize the reward function

./script/train_RL.sh

When training with policy gradient (pg)

you may need a reversed model

the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.

you can download pre-trained reversed model by

./script/download_reversed.sh

or you can train it by your-self

you don't need to change any setting about reversed model if you use pre-trained reversed model

Step6-1: test a RL model

Let's generate some results of RL model, and find the different from seq2seq model :)

./script/test_RL.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step6-2: generate a dialog

And show some dialog results from RL model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

Environment

  • OS: CentOS Linux release 7.3.1611 (Core)
  • CPU: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
  • GPU: GeForce GTX 1070 8GB
  • Memory: 16GB DDR3
  • Python3 (for data_parser.py) & Python2.7 (for others)

Author

Po-Chih Huang / @pochih

rl-chatbot's People

Contributors

pochih avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-chatbot's Issues

chat bot was bad learning

I tried to learn seq 2 seq from the beginning using python3, but when I tested it by turning the first 10 epochs, all the responses were . Will this be cured if I study more?

=== Use model ./model/Seq2Seq/model-1 ===

question => Have you heard about 'machine learning and having it deep and structured'?
generated_sentence => .
question => Machine learning.
generated_sentence => .
question => I don't know. Maybe we should watch the tape to be sure.
generated_sentence => .
question => Listen man, I don't need this shit.
generated_sentence => .
question => Will you stand up for me?
generated_sentence => .
question => How do you trun this on?
generated_sentence => .
question => Thank God it's Friday!
generated_sentence => .
question => I'm sure a lot of people down in L.A. are worried sick about you.
generated_sentence => .
question => I forgot to get the Coca-Cola.
generated_sentence => .
question => How about you graduation thesis?
generated_sentence => .

您好,我重新训练SEQ2SEQ模型后用 text.py 但是得到以下结果:

您好,我重新训练SEQ2SEQ模型后用 text.py 但是得到以下结果:
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
您知道是什么原因么?

Dialogue History

How is the dialogue history encoded here? In the paper they say "The previous two dialogue turns are transformed to a vector representation by feeding the concatenation of them into an LSTM encoder model".

I'm not sure how to interpret this and I'm interested in how it's realized here.

Thanks

请问你有遇过reward爆炸的情况吗?

我只用了 Ease of answering 作为reward,但是随着训练这一项从-2.x开始一直减小到负无穷。 我没有用sigmoid,但是也很奇怪,因为原作者也没有加sigmoid。

Where is RL environment?

Hi,

Thanks for this code repo.
I have one question , which environment you are using for RL ?

Thanks
Mahesh

Did you share the parameters of LSTM in encoder and decoder

In the python/model.py script, the decoding stage:

`with tf.variable_scope("LSTM1"):
output1, state1 = self.lstm1(padding, state1)

with tf.variable_scope("LSTM2"):
output2, state2 = self.lstm2(tf.concat([current_embed, output1], 1), state2)`
which is the same in encoding stage. Did you share the same parameters between encoder and decoder?

How to use saved_model api to store the model?

Hi, i wander to know if you have tried save the chatbot with saved_model api?Do you have any ideas?
你好,我想知道你有尝试过用saved_model来保存模型吗?我想在tensorflow serving上加载chatbot模型,但是只能加载saved_model保存的模型文件。

Why the sigmoid in count_rewards()

Hi,

In python\RL\train.py, after adding the ease of answering reward and semantic coherence, the sigmoid of the reward is scaled by 1.1

total_loss = sigmoid(total_loss) * 1.1

What was the purpose of the sigmoid and the scaling (1.1) in this line 261?

Also, I noticed you didn't weight each reward by lambda like in the "Deep Reinforcement Learning for Dialogue Generation" paper. Was this on purpose?

Thanks!

unable to run ./scripts/train_RL.sh

Traceback (most recent call last):
File "python/RL/train.py", line 470, in
train()
File "python/RL/train.py", line 297, in train
train_op, loss, input_tensors, inter_value = model.build_model()
File "/home/ubuntu/AI_studio/Lakshmi/RL-Chatbot/python/RL/rl_model.py", line 92, in build_model
train_op = tf.train.AdamOptimizer(self.lr).minimize(pg_loss)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 413, in minimize
name=name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 597, in apply_gradients
self._create_slots(var_list)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 131, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 1155, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 190, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 164, in create_slot_with_initializer
dtype)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 74, in _create_slot_var
validate_shape=validate_shape)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 882, in _get_single_variable
"reuse=tf.AUTO_REUSE in VarScope?" % name)
ValueError: Variable Wemb/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

Please provide info on how to solve this error.

wrong output

hey, thanks for the code. I used your pre-trained model. but getting absurd responses. Should I train it to more epochs?
A: Have you heard about 'machine learning and having it deep and structured'?
B: Woods path woods bee sir takes nowhere thompson.
A: Reminds 'a name's working rebels secrets guts procedure fairy missus pain warned procedure ignorant troops wrap famous pain dna wheel troops.
B: Tongue hooked ignorant tongue tongue manner ignorant positively pain warned break real travel sir.
A: Pain las tricks putting ears shack warm ignorant positively arrives brand woods knew jersey domino wrap.
B: Reminds whores pain warned river rio brand.
A: Mob longer. Procedure woody luther wasted tricks specialty assumed window pain scientist.
B: Wigand much shift mud traitor woody stab brand submit submit touches tanks department woody feed pain middle wrap ignorant unlikely.
A: House real policeman term.
B: Mob slaves knee. Meters bat assumed martin woods tongue pit.
A: Jesse electrical reports domino real.
B: Redi whores real. Redi whores real. Desmond real.

procedure

can you upload file which can explain step by step procedure to run this

Qusetion about ease of answering

Thanks for your sharing.

In RL for ease of answering, the reward is calculated by RL model itself, not another model?

Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?

How to train reversed model for RL model

You mentioned "
When training with policy gradient (pg)

you may need a reversed model

the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.
"
Except downloading pre-trained reversed model, could you please tell how to rain it?

Thank you a lot.

A question about ease of answering

Thanks for your sharing.

In RL for ease of answering, the reward is calculated by RL model itself, not another model?

Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?

pip install -r requirements.txt is failing -Could not find a version that satisfies the requirement tensorflow==1.0.1

I am getting an error when try to run
pip install -r requirements.txt
Is saying
Could not find a version that satisfies the requirement tensorflow==1.0.1 (from -r requirements.txt (line 1)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1)
No matching distribution found for tensorflow==1.0.1 (from -r requirements.txt (line 1))
I have tensorflow cloned recent version- since i have windows10 also my python version
Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Please let me know how to run pip install -r requirements.txt
Issue.txt

Tensorflow2.x

I would like to ask whether you have tried to modify this into the version of tensorflow2.x?

Unable to load pretrained model

Use default model

Traceback (most recent call last):
File "python/test.py", line 138, in
test()
File "python/test.py", line 72, in test
saver.restore(sess, default_model_path)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [7634,1000] rhs shape= [6851,1000]
[[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@wemb"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Wemb,
save/RestoreV2_4)]]

RL training

Hi,
Thanks for having shared your implementation of the RL chatbot.
I might ask stupid questions since I am not an expert in RL neither in NLP so sorry in advance!
1- In python/RL/train.py
l307, saver.restore(sess, os.path.join(model_path, model_name)) seems to intialize the weight of the model with some pretrained params, correct? Is it the ones given by the Seq2seq trained as usual in a supervised way? I dont find anywhere the 'model-55' you are using for this... Am I missing something?

2- In python/RL/rl_mpdel.py
Why do we have build_model and build_generator, it seems to have the same setup but not the same output. Is it RL specific?

3- In the paper
Also, in the paper they specified that for the reward they use a seq2seq2 model and not the RL model. Is this taken into consideration in your code?

Thanks a lot for your answers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.