The chatbot-startkit from lucko515

Cannot load movie_lines.txt - 'utf-8' codec can't decode byte 0xad in position 3767: invalid start byte

Dear Luka

Thanks for this repository. I am currently learning from it and I found the following error from the very beginning of loading the dataset:

sentences = {}
with open('cornell movie-dialogs corpus/movie_lines.txt', 'r') as f:
    for line in f.readlines():
        sentences[line.split(' +++$+++ ')[0]] = line.split(' +++$+++ ')[-1].replace('\n', "")

And the error is this:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-35-66409f9e14a9> in <module>()
      1 sentences = {}
      2 with open('cornell movie-dialogs corpus/movie_lines.txt', 'r') as f:
----> 3     for line in f.readlines():
      4         sentences[line.split(' +++$+++ ')[0]] = line.split(' +++$+++ ')[-1].replace('\n', "")

//anaconda/envs/tensorflow/lib/python3.5/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 3767: invalid start byte

Even if I download directly these text files from your repo: movie_answers_2.txt and movie_questions_2.txt, it shows same error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-38-ae6b005fad2b> in <module>()
      4 with open('movie_questions_2.txt', 'r', encoding='utf-8') as f:
      5 
----> 6     lines = f.readlines()
      7 
      8     for text in lines:

//anaconda/envs/tensorflow/lib/python3.5/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 1085: invalid start byte

Can you please tell me what happened and how to fix this?

Thank you very much.

TypeError: 'NoneType' object is not iterable

Hi, I am getting the above error when executing the following line in chatbot.ipynb:
model = Chatbot(config.LEARNING_RATE,
config.BATCH_SIZE,
config.ENCODING_EMBED_SIZE,
config.DECODING_EMBED_SIZE,
config.RNN_SIZE,
config.NUM_LAYERS,
len(vocab),
word_to_id,
config.CLIP_RATE)

Please can you advise?

lucko515 / chatbot-startkit Goto Github PK

chatbot-startkit's Introduction

Hi there 👋

chatbot-startkit's People

Contributors

Stargazers

Watchers

Forkers

chatbot-startkit's Issues

Cannot load movie_lines.txt - 'utf-8' codec can't decode byte 0xad in position 3767: invalid start byte

TypeError: 'NoneType' object is not iterable

can't sample chat.

Indentation Error

How to test the Saved model.

Indentation Error !

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent