heya <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

of course. here's the code and text file. <div class="highlight highlight-source-p

Thanks for sharing this, <a class="user-mention notranslate" data-hovercard-type="user

thanks for taking a look, <a class="user-mention notranslate" data-hovercard-type="use

Thank you for the helpful clarification, <a class="user-mention notranslate" data-hove

hello again, <a class="user-mention notranslate" data-hovercard-type="user" data-hover

addititon: here's what happens when it receives only one word <a target="_blank" r

markovify's make_sentence_with_start() doesn't seem to work properly about markovify HOT 11 OPEN

nezetimesthree commented on June 5, 2024

markovify's make_sentence_with_start() doesn't seem to work properly

from markovify.

Comments (11)

jsvine commented on June 5, 2024

Hi @nezetimesthree, and thanks for your interest in markovify. When you get a chance, please provide code and text that reproduces the problem. Without that, it will unfortunately be quite hard to debug.

from markovify.

nezetimesthree commented on June 5, 2024

of course. here's the code and text file.

from transformers import pipeline
import random
import markovify

model_link = "IProject-10/bert-base-uncased-finetuned-squad2"
question_answerer = pipeline("question-answering", model=model_link)

with open('mayakovsky.txt', 'r') as file:
  f = file.readlines()
  poems = []
  poem = ''
  dataset = ''
  for line in f:
    dataset += line.strip() + '. '
    if line != '\n':
      poem += line.strip() + ' '
   
 else:
      poems.append(poem)
      poem = ''

context = random.choice(poems)
question = input()

answer = question_answerer(question=question, context=context)['answer']

print(answer, '->', ' '.join(answer.split()[-2:]))

text_model = markovify.Text(' '.join(poems))

if len(answer.split()) > 1:
  print(text_model.make_sentence_with_start(' '.join(answer.split()[-2:]), strict=False, tries=100), end='\n')
else:
  print(text_model.make_sentence_with_start(answer, strict=False, tries=100), end='\n')
for i in range(5):
  print(text_model.make_short_sentence(200, min_length=100, tries=100), end='\n')

mayakovsky.txt

from markovify.

jsvine commented on June 5, 2024

Thanks for sharing this, @nezetimesthree.

It seems that you're passing to make_sentence_with_state a "start" that was generated by an LLM, which is not guaranteed to be a start that actually exists in your corpus, which is a requirement for markovify and this type of Markov chain generally. Is that correct? If so, this is expected behavior of markovify and I would not consider it a bug.

If I've misunderstood, could you share a simpler code example that doesn't depend on other libraries, yet still reproduces the problem? In this example, the logic that uses IProject-10/bert-base-uncased-finetuned-squad2 is fairly intertwined here with the logic that uses markovify, and there are several different calls to markovify, making it difficult to debug.

from markovify.

nezetimesthree commented on June 5, 2024

thanks for taking a look, @jsvine. but you're misunderstanding this: LLM gives answers only from the given context, which, in this case, is one of the poems from the file. i've checked the errors in poem dataset, and the words were there always. for some reason, NewlineText didn't see them as a start for sentences. maybe it's because some of the lines consist only of one word? could this be the issue?

from markovify.

jsvine commented on June 5, 2024

Thank you for the helpful clarification, @nezetimesthree. Could you share a start that the code fails on but that is definitely a start in the corpus?

from markovify.

nezetimesthree commented on June 5, 2024

hello again, @jsvine. sorry i didn't answer yesterday, but here's the example, the error, and the proof that it's clearly there.

from markovify.

jsvine commented on June 5, 2024

Thanks; can you share that as copy-pasteable text?

from markovify.

nezetimesthree commented on June 5, 2024

addititon: here's what happens when it receives only one word

can you clarify what you mean by "copy-pastable text", though? if i understand you corretcly, then the words are "ладно слажен" and "Наоборот"

from markovify.

jsvine commented on June 5, 2024

Great, thanks; that's what I was looking for, indeed.

from markovify.

jsvine commented on June 5, 2024

Thanks again for the helpful example. Taking a closer look, the issue seems not to be with make_sentence_with_start, but rather the sentence parser much earlier in the processing pipeline.

import markovify

with open("mayakovsky.txt", "r") as file:
    model = markovify.Text(file.read())


def test_presence(fragment):
    return any(
        any(fragment == token for token in sentence)
        for sentence in model.parsed_sentences
    )


print(test_presence("Послушайте!"))
print(test_presence("слажен"))

Prints:

True
False

The default Text model uses a regex-powered filter to remove sentences that could cause problems, mostly re. apostrophes and quotation marks. It also invokes unidecode, which seems to be causing the problem here. Because it's a generally useful approach, I don't want to remove that step from the library, but there are two ways you should be able to handle on your end:

Calling markovify.Text(..., well_formed=False), which skips the filtering step
Extending markovify.Text (documented here) to behave in a way better suited to your corpus.

Using well_formed=False seems to work well, although you'll have to contend with the punctuation (or strip it out in a pre-processing step), as you'll see with the comma below:

import markovify

with open("mayakovsky.txt", "r") as file:
    model = markovify.Text(file.read(), well_formed=False)

print(model.make_sentence_with_start("ладно слажен,"))

Prints: ладно слажен, — и все обвыл.

from markovify.

nezetimesthree commented on June 5, 2024

thank you very much, @jsvine. i will test it and return with the result next week. sorry for making you wait for it, but i just won't have a chance this week. thank you again, and we'll see if this works.

from markovify.

markovify's make_sentence_with_start() doesn't seem to work properly about markovify HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent