I'm trying to run the parse using spacy on the American Constitution (<a href="https:/

Error when parsing the American Constitution with spacy about self-attentive-parser HOT 3 CLOSED

nikitakit commented on June 3, 2024

Error when parsing the American Constitution with spacy

from self-attentive-parser.

Comments (3)

nikitakit commented on June 3, 2024

Thanks for reporting!

The issue here is due to Section 8, which is a single (very long) sentence. The parser was not trained on sentences of this length, and there is a hard-coded limit of 300 words per sentence.

I'll work on updating the code to throw a more understandable error in this situation.

Unfortunately I don't see a straightforward way to relax the length restriction. The training sentences are much shorter (most are less than 40 words long), and the way word positions are represented in the parser has theoretical issues with generalizing to longer lengths. Moreover, the memory used by decoding algorithm scales as the square of the sentence length, which is why you are seeing OOM (out-of-memory) errors on GPU.

I'll think about what to do in these situations, but for now my recommendation would be to modify the text in some way to omit super-long sentences. For example, you can take turns deleting some of the "To ...;" lines and parsing the rest, and then combine everything to build a parse tree.

from self-attentive-parser.

shlomihod commented on June 3, 2024

Thanks!

from self-attentive-parser.

nikitakit commented on June 3, 2024

I'm closing this issue since I've added more understandable error messages when a sentence that is too long is given to the parser. Unfortunately there's no easy way to relax the length restrictions themselves, especially now that the recommended parser models use BERT (which has length limits of its own, for many of the same reasons).

from self-attentive-parser.

Recommend Projects

Error when parsing the American Constitution with spacy about self-attentive-parser HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent