Subject of the issue <a class="issue-link js-issue-link" data-err

Testing the new code from <a class="issue-link js-issue-link" data-error-text="Failed

Seems to be my node/mac version! <div class="snippet-clipboard-content notranslate

I looked for a couple of those in <a href="https://github.com/micromark/micromark/blob

Awesome! Main seems to be <a href="https://github.com/micromark/micr

unravelLinkedTokens RangeError: Maximum call stack size exceeded about micromark HOT 18 CLOSED

ChristianMurphy commented on June 1, 2024

unravelLinkedTokens RangeError: Maximum call stack size exceeded

from micromark.

Comments (18)

ChristianMurphy commented on June 1, 2024 2

Testing the new code from #21 , It seems to reach it's limit at:
crash-d40a5f23182756de9477e2782ba15dd2be71457d6cd8d0c7637795b5f9ecd6ba.txt
15 minutes in, it hasn't crashed yet (progress!), but it also seems to be hung/stalled.

I'm running a profiler on it to see if it's related to unravelLinkedTokens, or something else.

edit: it did eventually finish, including flame graph of run flamegraph.html.log

from micromark.

fazouane-marouane commented on June 1, 2024 1

Hello 👋

I've provided a fix for the three crash reports in #21

However, we still have an issue with the following and it's unrelated to unravelLinkedTokens

var micromark = require('./lib')
var doc = Array(35_000).fill('[](').join('')

var result = micromark(doc)
console.log(result)

It seems to be caused by the presence of a call to main() in createTokenizer().write()

from micromark.

wooorm commented on June 1, 2024 1

Seems to be my node/mac version!

$ node test/perf.js 
base
25ms
strong
4s
strong/emphasis?
4s
unclosed links
291ms
unclosed links (2)
919ms
tons of definitions
24s

from micromark.

fazouane-marouane commented on June 1, 2024 1

Merging tokens is certainly a good idea. I hope that it will speed things up !

from micromark.

wooorm commented on June 1, 2024 1

I managed to get the definitions case down to .7s by adding one line ✨ haha, pfew

from micromark.

wooorm commented on June 1, 2024 1

the unclosed (3) case mentioned before is still slow, but the rest are fine:

base
24ms
strong
1s
strong/emphasis?
1s
unclosed links
365ms
unclosed links (2)
417ms
tons of definitions
746ms

from micromark.

wooorm commented on June 1, 2024

I looked for a couple of those in test/perf too. I’m sure there are other cases where malicious data could be made to cause either stackoverflows or a big delays (I believe the 10k definitions in one content block also crash).

from micromark.

wooorm commented on June 1, 2024

Awesome!

Main seems to be this one, which is where every state is called from. It should be that the self time (time spent actually in main) is short, but the total time (time spent in functions called from main) is long

from micromark.

fazouane-marouane commented on June 1, 2024

After some thoughts, I did some more digging for the last issue and we may have too many subevents. This is why we have the stackoverflow, main() is only called 3 or 4 times.

However subevents go from 3, 4 to 140000 and cause Array#splice to stackoveflow which is weird

from micromark.

wooorm commented on June 1, 2024

To add some more info: for markdown, we first need to parse blocks (headings, definitions, etc), and then we can do inlines (emphasis, links, etc). Reason: later definitions need to be found first, before parsing references. Hence, subtokenize, which deals with the events from one tokenizer and passes it to other tokenizers.
In this case, there is one giant paragraph, where every character is a separate token.

Q: do you know if those 140k events include (almost all) tokens which are data? Or are they other types of markers still?

from micromark.

fazouane-marouane commented on June 1, 2024

@wooorm I confirm that all tokens are enter/exit of individual characters. I tracked the issue to the fact that Array#splice takes all items to be inserted as individual argument. So I tried to implemented a chunked version of it at it works like a charm 🔥 . I updated the original PR with some tests

from micromark.

wooorm commented on June 1, 2024

That’s awesome!

Do you know about the performance improvement? (test/perf should have an example of this case, where you can quickly measure before and after the commit?)

Another idea I had: while tokenizing the text, the values are separate. When we know they aren’t links, they are marked as data. This is also done for several other things (whitespace is also a data token). Would merging adjacent enter/exits for data solve this too?

from micromark.

fazouane-marouane commented on June 1, 2024

Yes, I think that we may fix the issue if the tokens are merged but I can't say for sure that we won't have the same issue with a different input.
I ran the performance tests and I got a stackoverflow on main branch

# before
base
32ms
strong
5s
strong/emphasis?
5s
unclosed links
975ms
unclosed links (2)
1s
tons of definitions
/Users/marouane/Projects/micromark/dist/util/normalize-identifier.js:7
      .replace(/[\t\n\r ]+/g, ' ')
       ^

RangeError: Maximum call stack size exceeded
    at String.replace (<anonymous>)
    at normalizeIdentifier (/Users/marouane/Projects/micromark/dist/util/normalize-identifier.js:7:8)
    at Object.resolveDefinition [as resolve] (/Users/marouane/Projects/micromark/dist/tokenize/definition.js:26:20)
    at addResult (/Users/marouane/Projects/micromark/dist/util/create-tokenizer.js:291:27)
    at onsuccessfulconstruct (/Users/marouane/Projects/micromark/dist/util/create-tokenizer.js:199:5)
    at ok (/Users/marouane/Projects/micromark/dist/util/create-tokenizer.js:265:9)
    at after (/Users/marouane/Projects/micromark/dist/tokenize/definition.js:262:14)
    at main (/Users/marouane/Projects/micromark/dist/util/create-tokenizer.js:130:17)
    at write (/Users/marouane/Projects/micromark/dist/util/create-tokenizer.js:58:5)
    at flatMap (/Users/marouane/Projects/micromark/dist/util/flat-map.js:11:28)

# after
base
36ms
strong
5s
strong/emphasis?
5s
unclosed links
958ms
unclosed links (2)
1s
tons of definitions
31s

from micromark.

wooorm commented on June 1, 2024

I’m currently getting (I have a couple of refactored things locally, but nothing noteworthy performance-wise):

base
25ms
strong
4s
strong/emphasis?
4s
unclosed links
285ms
unclosed links (2)
917ms
tons of definitions
/Users/tilde/Projects/oss/micromark/dist/tokenize/definition.js:291
    return nok(code)
           ^

RangeError: Maximum call stack size exceeded

It seems my unclosed links 1 case is significantly faster, which seems to indicate that the chunked splice is slower. However, as the definitions are working for you, the while loop and the chunked splice fix the stack overflow at least?

from micromark.

fazouane-marouane commented on June 1, 2024

Yes I don't have a stack overflow when I combine both the while loop and the chunked splice. Can you try the perfs of my branch on your computer ? 285ms is indeed quite fast ! I don't know if the speedup/slowdown is due to the computer, the code or node's version 🤔

from micromark.

wooorm commented on June 1, 2024

So that indicates your code fixes the crash. Adding your new test case to test/perf gives me 7s:

console.log('unclosed links (3)')
then = Date.now()
m('[](<'.repeat(35000))
console.log(ms(Date.now() - then))

Also: It might be that some of the refactoring I’ve done locally made unclosed links (1) much faster for me.

Definitions are still way too slow tho. And unclosed links (3) too. I’m hopeful that merging data tokens will help there.

from micromark.

fazouane-marouane commented on June 1, 2024

Wow ! What was the line ? 😮

from micromark.

wooorm commented on June 1, 2024

This one! 09a4e43

It’s there to prevent links in links. But it walked from the link back to the start of the events (to prevent all previous brackets) every time. Now it exits if it already has one.

Could be better: it’ll still keep walking to the start in cases where there is no link opening before!

from micromark.

unravelLinkedTokens RangeError: Maximum call stack size exceeded about micromark HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent