Comments (8)
I'm working in a naive approach on my fork.
To build the reversed matrix, i reverse the sentences then i process it, probably not the best idea but it looks like it's working. We will probably need another method in order to create that matrix.
In order to make it work i combine make_sentence_that_finish
with .make_sentence_with_start(‘fish’, strict = False) as @jsvine explained.
Still working in multi words containing, it's experimental right now.
Any suggestions are welcome
from markovify.
This works fine for me on a corpus of 14 MB.
generated_headlines = []
while len(generated_headlines) < 25:
headline = model.make_sentence(tries=100)
if headline is not None:
if word.lower() in headline.split(' '):
generated_headlines.append(headline)
It helps to also check the Markov chain to see how often that word appears. If it appears at a very low frequency, that might be why it takes a long time for the model to generate a sentence with the word in it. There's nothing we can do about it in that case, other than to choose a different word or to substantially increase the size of the corpus.
If make_sentence_that_contains
becomes a feature, a word-frequency check might help.
from markovify.
There's no built-in method to do this in markovify
. The easiest way for you to do it would probably be to have a while
generate sentences until it generates one that contains the word you want. E.g.,
while True:
sentence = text_model.make_sentence()
if "computer" in sentence: break
print(sentence)
from markovify.
Thanks, I will give that a try.
from markovify.
I'm sorry to re-open this Feature Request, but I've been trying the solution approach of using the while loop and with a corpus of 6mb it takes almost 30 seconds (or more) to generate a sentence containing the specific word.
It's somehow possible for you to implement this feature in code so it would be more fast to create sentences?
Thank you very much, and continue the good work!
from markovify.
Hi @voltaxvoltax, and thanks for reminding me about this thread. Implementing a more efficient version of make_sentence_that_contains
would require a nontrivial amount of experimentation and testing. That said, I’m open to including such a feature in markovify
. If you (or anyone else reading this thread) would like to try coding a proposal, drop a note here and we can discuss how the feature might work.
Assuming the word is fish
, to construct a sentence containing that word, we’d need to generate the following:
-
The words that come after
fish
. This is easy, since we can just use.make_sentence_with_start(‘fish’, strict = False)
. -
The words that come before
fish
. This is more complicated; we would first need to calculate the reverse Markov probabilities for the corpus. That logic isn’t currently built intomarkovify
.
from markovify.
Just to +1 interest in this, I have a project right now with this exact use case. I agree with your analysis that the only way to achieve this is a reverse markov probability, but it would be highly useful. Just to describe my goal quickly, it would be to 'half-simulate' a conversation between different models. The first one generates a sentence, the second one generates a sentence containing some word from the first one, and so on.
Thanks for the consideration. I'd be very tolerant to longer model processing time if there were a second reverse model processed at the same time.
from markovify.
You could generate a sentence that begins with the keyword and them generate another random sentence, cut it on a verb (using nltk) and append it at the beggining of the other sentence.
from markovify.
Related Issues (20)
- subclassing markovify.Text to allow for different types of 'sentences' HOT 3
- Decreasing export size / memory usage HOT 1
- Character level chains instead of word level? HOT 2
- Markovify always outputs "None" with russian corpus HOT 12
- markovify and music HOT 1
- Thank you for a job well done! HOT 2
- I can’t install because of the encoding of the file HOT 1
- Can I generate sentence with only two words? HOT 2
- generate sentence with it's prediction HOT 2
- spaCy model shortcuts are deprecated HOT 1
- Non-english characters are not being displayed correctly.
- markov_text_model.make_sentence_with_start KeyError HOT 1
- Fallback without building a new model? HOT 1
- “python_requires” should be set with “>=3.6”, as markovify 0.9.3 is not compatible with all Python versions. HOT 1
- Control generated sentences randomness HOT 2
- - HOT 2
- missing utf-8 BOM lead to codec failures during tests on windows
- Markovify - Markov chain : Seed and Condition to text generated based in input. HOT 2
- markovify's make_sentence_with_start() doesn't seem to work properly HOT 11
- Can't install on browser webpage.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from markovify.