Code Monkey home page Code Monkey logo

deepthought's Issues

有一些可以补充的todo

hi wwj718,我前段时间也看到了ChatterBot,确实他逻辑、代码比较清楚、简单。
聊天语料的话,我觉得还可以尝试爬一下微博,贴吧的对话语料。也许会比较有用
还有一个问题就是,聊天语料多了之后效率怎么样,现在在match 的时候是找到query一一算相似度。可能语料比较多了就要考虑这个问题了。

chatterbot 使用中文语料库,报编码错误

使用win10 python2.7
在代码中设置了编码,但是还是报错。英文是ok的。
bot.py

# -*- coding: utf-8 -*-
import sys;reload(sys);sys.setdefaultencoding('utf8')

from chatterbot import ChatBot

chatbot = ChatBot(
    'ABC',
    trainer='chatterbot.trainers.ChatterBotCorpusTrainer',
    silence_performance_warning=True
)

# Train based on the english corpus
chatbot.train("chatterbot.corpus.chinese")

# Get a response to an input statement
response = chatbot.get_response("很高兴认识你")
print(response)

运行 python bot.py

[nltk_data] Downloading package stopwords to                                                     
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package stopwords is already up-to-date!                                           
[nltk_data] Downloading package wordnet to                                                       
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package wordnet is already up-to-date!                                             
[nltk_data] Downloading package punkt to                                                         
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package punkt is already up-to-date!                                               
[nltk_data] Downloading package vader_lexicon to                                                 
[nltk_data]     C:\Users\Administrator\AppData\Roaming\nltk_data...                              
[nltk_data]   Package vader_lexicon is already up-to-date!                                       
Traceback (most recent call last):                                                               
  File "bot.py", line 13, in <module>                                                            
    chatbot.train("chatterbot.corpus.chinese")                                                   
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 117, in train           
    trainer.train(pair)                                                                          
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 82, in train            
    statement = self.get_or_create(text)                                                         
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\trainers.py", line 25, in get_or_create    
    statement = self.storage.find(statement_text)                                                
  File "D:\AnacondaSetup\lib\site-packages\chatterbot\storage\jsonfile.py", line 46, in find     
    values = self.database.data(key=statement_text)                                              
  File "D:\AnacondaSetup\lib\site-packages\jsondb\db.py", line 98, in data                       
    return self._get_content(key)                                                                
  File "D:\AnacondaSetup\lib\site-packages\jsondb\db.py", line 52, in _get_content               
    obj = self.read_data(self.path)                                                              
  File "D:\AnacondaSetup\lib\site-packages\jsondb\file_writer.py", line 15, in read_data         
    obj = decode(content)                                                                        
  File "D:\AnacondaSetup\lib\site-packages\jsondb\compat.py", line 28, in decode                 
    return json_decode(value, encoding='utf-8')                                                  
  File "D:\AnacondaSetup\lib\json\__init__.py", line 352, in loads                               
    return cls(encoding=encoding, **kw).decode(s)                                                
  File "D:\AnacondaSetup\lib\json\decoder.py", line 364, in decode                               
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())                                            
  File "D:\AnacondaSetup\lib\json\decoder.py", line 380, in raw_decode                           
    obj, end = self.scan_once(s, idx)                                                            
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd4 in position 0: invalid continuation byte 

不知道是不是有其他因素影响,给官方提了issue回复的人也不清楚什么原因
另外能不能把console前面打出来的[nltk_data]... 隐藏掉,看着好烦 -.-

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.