Comments (12)
The emscripten issues are in #1361 #1374 and #1377
from link-grammar.
For 6.00 I have many PRs that I would like to include at least some of them:
- Dict token insertion (need to find the issue number).
- Tokenization drastic speed improvements.
- Generator drastic speedup.
- Generator API.
- Cross-links implementation (I need your answers to my old questions + more discussion, in order to complete it).
- Implement power-prune for expressions in order to make power_prune() much faster.
- Simplify expressions before converting them to disjuncts (it speeds up building the disjuncts).
(The code was ready for PR but then I changedExp_struct
before I sent it, and its conversion to the new struct turned out to be buggy so I need to work on it some more...). - More power-pruning! It removed an additional ~5% of the disjuncts. (This new power pruning had worked but then I introduced a bug without committing the working code..., so again I need to continue debugging...).
- Rewritten post-processing, for drastic postprocessing speedup and drastically increasing the number of good linkages per
linkage_limit
. - Tests for
link-parser
. - Graphical
link-parser
(Python). - Local hard costs (we need to discuss this).
- Segmentation according to the dict.
- Partial parsing infrastructure.
- Phantom word handling.
- Capitalization handling by dict definitions.
from link-grammar.
Re tokenization speed: In one of my atomese use-cases on and older slower machine, I see the following performance:
- 500 millisecs tokenization
- 42 millisecs prepare-to-parse
- 400 millisecs count
- 1200 millisecs extract linkages
The above was obtained using sentences that are all exactly 12 words long. Dictionary lookup times not included in the tokenization. Linkages limit = 15K
from link-grammar.
More about tokenization. With the atomese dicts, the dict can grow after every sentence. Thus, I call condesc_setup(dict);
after tokenization, before parsing. It took me two days to discover that it runs about 1sec at first, growing to 10 sec after a while. Thus, it acounts from 1/3 of grand-total sentence time at first, to 80% after a while.
I need to find some way of doing what it does incrementally. Possibly by telling it exactly what expressions were added. -- fixed in #1459
from link-grammar.
I published version 5.12.1 -- I couldn't wait, certain automation scripts depend on the published tarballs.
from link-grammar.
hi @linas
I tried updating to 5.12.2 in Gentoo but am getting build failures:
In file included from /var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.cpp:1:
/var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.hpp:23:83: error: 'X_node' does not name a type
23 | const std::vector<int>& er, const std::vector<int>& el, const X_node *w_xnode, Parse_Options opts)
| ^~~~~~
In file included from /var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.cpp:1:
/var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.hpp:82:9: error: 'X_node' does not name a type
82 | const X_node *word_xnode;
| ^~~~~~
which we haven't seen in 5.12.0
from link-grammar.
build failures:
I'm looking. Recommended fix is to disable the build of the sat-solver code. Since it's disabled by default, your build scripts must have turned it on. (Just run ../configure
without any options.)
The recommendation is to disable, because the SAT parser is slower, in all situations, than the regular parser; in some cases, it is 10x or 20x slower. I've been considering deleting it permanently, although Amir convinced me that it can be fixed up. And so .. its in limbo ...
@SoapGentoo If you are willing to carry patches, I just pushed a fix here: ffdf5d8
Otherwise, wait for 5.12.3 ... which might appear in a few weeks(? I have plans for "urgent" Atomese fixes which necessitate an LG release.)
from link-grammar.
@SoapGentoo Version 5.12.3 is now out, with the fix you reported above.
from link-grammar.
@linas after confirming that 5.12.3 works indeed, I proceeded to pass --disable-sat-solver
to ./configure
to disable the SAT solver as per your recommendations. Thanks 👍
from link-grammar.
Cool. OK. FWIW. the SAT solver is already disabled by default (configure.ac
lines 365ff) so if it was on for you, then somehow you were carrying a config setting from long ago? Keep in mind that ./configure
does not start with a clean state; it remembers flags from prior invocations. (This also reveals my testing is incomplete.)
from link-grammar.
in general, we like to specify all options to ./configure
, since it makes our configuration more robust to changes of default settings. In this case, the --enable-sat-solver=bundled
was added due to a conflict with the system minisat: https://bugs.gentoo.org/593662
from link-grammar.
Hm. OK. SAT was disabled to discourage it's use. In all situations, it is always slower, sometimes slower by factors of 10x or 100x. Amir says that, in fact, this can be fixed up and repaired, which might make SAT faster than the regular parser, maybe.
Whether this is worth the effort, or not, depends mostly on future applications, rather than on the current situation. For the present English, russian, Thai, etc. dictionaries, reviving SAT seems pointless: the current parser is good enough. However, I'm working with brand-new dicts which have radically different structure, and different performance profiles, and make different demands on the parser. And for those, maybe the SAT parser could be faster or more space-efficient. Maybe, or maybe not. Unexplored.
from link-grammar.
Related Issues (20)
- asert in do_count HOT 8
- insane ./configure in github circleci HOT 3
- incorrect use of pcre2_match_data would result in severe memory leak HOT 15
- Emscripten build fails HOT 3
- multi-threaded dictionary crash HOT 18
- Problems in printing sentence split time HOT 2
- pool management idea... HOT 1
- Link-deduplication multi-connector issue
- Sharing parse choice elements HOT 8
- pool_alloc_vec() with too many elts.
- classic_parse: Sentence disjunct count 108279 exceeded limit 105123 HOT 6
- Stop litering `.lg_history` in current directory. HOT 8
- Get rid of `null_count>0` parsing HOT 26
- Flood-counting HOT 35
- SAT parser improvements HOT 4
- Get rid of max-cost HOT 14
- Word "test" in English dict 5.12.1 vs older ones HOT 3
- `www.abisource.com` is not accessible HOT 16
- Make - failure to find link-names.o HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from link-grammar.