Comments (3)
Thanks for reporting!
The issue here is due to Section 8, which is a single (very long) sentence. The parser was not trained on sentences of this length, and there is a hard-coded limit of 300 words per sentence.
I'll work on updating the code to throw a more understandable error in this situation.
Unfortunately I don't see a straightforward way to relax the length restriction. The training sentences are much shorter (most are less than 40 words long), and the way word positions are represented in the parser has theoretical issues with generalizing to longer lengths. Moreover, the memory used by decoding algorithm scales as the square of the sentence length, which is why you are seeing OOM (out-of-memory) errors on GPU.
I'll think about what to do in these situations, but for now my recommendation would be to modify the text in some way to omit super-long sentences. For example, you can take turns deleting some of the "To ...;" lines and parsing the rest, and then combine everything to build a parse tree.
from self-attentive-parser.
Thanks!
from self-attentive-parser.
I'm closing this issue since I've added more understandable error messages when a sentence that is too long is given to the parser. Unfortunately there's no easy way to relax the length restrictions themselves, especially now that the recommended parser models use BERT (which has length limits of its own, for many of the same reasons).
from self-attentive-parser.
Related Issues (20)
- How are F1 scores calculated? HOT 1
- what version of protobuf is suitable for this tool? HOT 1
- import benepar fails with torch error
- Error loading German model HOT 4
- Need help understanding the labels of the parser model
- Error reading EVALB results.
- error integrating with spacy HOT 1
- How to train a benepar on Ontonotes 5.0 (CONLL 2012) dataset?
- Different in parse results compared to the demo page HOT 3
- benepar_en2
- Parse quality on long sentences
- ```._.labels``` doesn't work for spans with length of one HOT 4
- Serializing the output
- Sentences with ; not split into different clauses
- Mark as unmaintained? HOT 2
- training data for benepar_zh2
- Parse pretokenized sentences?
- Cannot generate WSJ data HOT 1
- How to solve the unresognized model identifier
- The name of the benepar models (at least for french) are not updated.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-attentive-parser.