Comments (3)
Hello!
Since cf14dda I indeed split up punctuation from other tokens. The goal here was to make e.g. hello,
into two tokens, rather than just one, so the bot learns how to extend from hello
. Similarly, from I've
it learns I
followed by 've
, so that it can extend I
.
The issue you're seeing here is that it actually does not know how to extend ["I", "'ve"]
, because of a bug that converted I've
into I 've
when generating, but kept the word as I've
when learning. I've fixed that now.
However, another small issue still exists: My "detokenizer", i.e. the function to turn ca n't
back into can't
, isn't always perfect:
>>> TreebankWordDetokenizer().tokenize(["Do", "n't"])
"Do n't"
>>> TreebankWordDetokenizer().tokenize(["Do", "n't", "want"])
"Don't want"
>>> TreebankWordDetokenizer().tokenize(["I", "'ve"])
"I 've"
>>> TreebankWordDetokenizer().tokenize(["I", "'ve", "seen"])
"I've seen"
Using NLTK's TreebankWordDetokenizer. I can't really explain this at the moment, but it requires a fix from NLTK.
Thanks for reporting this!
- Tom Aarsen
from twitchmarkovchain.
This should be fixed now in the latest release. Feel free to upgrade. Your old database won't immediately update, but new entries should be correct now.
from twitchmarkovchain.
I've opened up an issue on NLTK for the tokenizing issue I mentioned: nltk/nltk#3069.
from twitchmarkovchain.
Related Issues (20)
- Reduce the amount of learned information required to start outputting
- How to activate bot HOT 2
- No such table HOT 3
- no such table: MarkovStartA Error HOT 2
- No pyvenv.cfg file HOT 2
- No such table error HOT 4
- Sending whispers does nothing HOT 1
- Is it possible to make the bot generate a reply after someone sends a message in the chat? HOT 1
- HTTP error 404 while installing the requirements HOT 1
- merging databases HOT 3
- bot stuck to the wrong account HOT 4
- No spaces before quotes HOT 2
- cannot whisper !generate to bot HOT 1
- Error: Unrecognized command: /mods HOT 2
- how to make bot only learn messages from a specific person HOT 3
- Problem with TwitchWebSocket while trying to run MarkovChainBot.py HOT 4
- Issue with sqlite3.OperationalError: unable to open database file
- More frequent errors occuring HOT 3
- Login authentication fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitchmarkovchain.