Comments (9)
Just in case: could it be #9 ? (Does the path for c.1
start with epsilons on the input side?)
from lttoolbox.
Just in case: could it be #9 ? (Does the path for
c.1
start with epsilons on the input side?)
I mean, yes, but so do all other paths, in this transducer:
0 1 ε ε 0.000000
1 13681 с с 0.000000
13681 723 . . 0.000000
723 7 ε <abbr> 0.000000
7 8 ε ε 0.000000
8 0.000000
Cf.
0 1 ε ε 0.000000
1 10 . . 0.000000
10 3 ε <sent> 0.000000
3 4 ε ε 0.000000
4 0.000000
But the latter path is in a separate section of the transducer, separated by --
in lt-print
output (the former path is below the --
, with most other things, and the latter is above, with only a few other things). This makes me think that @ftyers's hypothesis that it has to do with inconditional/standard section status might be right:
(00:44:56) spectie: it might expect that string to be in an inconditional section
(00:45:06) spectie: (there are different behaviours of the different sections)
(00:45:16) spectie: but the AttCompiler probably puts it in the standard section
from lttoolbox.
Hm, I think #9 might be about initial epsilons on input-side only (ie. not aligned, as in ε c
and then c ε
or something).
It's correct that lt-proc
would need the path for c.
to be in an inconditional
section in order to appear immediately before other standard
analyses. I guess the fix is that lt-comp
on att files should put things ending in periods/punctuation in inconditional
? That would also allow things like croc.
tokenised as ^cro$^c.$
(avoid that by making sure the dictionary also has croc
as one entry).
Is analysis of 1
in the standard
section btw? (If it is in inconditional
, the hypothesis is wrong – you can have a standard analysis immediately followed by inconditional.)
from lttoolbox.
Is analysis of
1
in thestandard
section btw? (If it is ininconditional
, the hypothesis is wrong – you can have a standard analysis immediately followed by inconditional.)
Most of what's above the --
appear to be number-loop-related things, but I can't find any paths that are the analysis of just 1
, whereas the part below --
does include the analysis of 1
. I assume the part below --
is standard
and not inconditional
?
from lttoolbox.
I believe the attcompiler's classify
function needs some refactoring/ bug-fixes:
https://github.com/apertium/lttoolbox/blob/master/lttoolbox/att_compiler.cc#L375
I am not sure I can work on it given my GSoC project.
The fix shouldn't be that hard but I need to discuss it with my mentors.
from lttoolbox.
Paths in the FST are classified based on the first non-tag non-epsilon symbol on the input side.
$ printf 'PATTERNS\n[c.]\n[1]\n' | lexd
0 1 c c 0.000000
0 2 1 1 0.000000
1 2 . . 0.000000
2 0.000000
$ printf 'PATTERNS\n[c.]\n[1]\n' | lexd > blah.att
$ lt-comp lr blah.att blah.bin
main@standard 3 3
$ echo 'c. 1 c.1' | lt-proc blah.bin
^c./c.$ ^1/1$ ^c/*c$.^1/1$
In this case, both c
and 1
are alphanumeric, so they both go into the standard
section type.
I think maybe the solution here is to allow two standard
entries without intervening whitespace if they begin or end with non-alphanumeric characters.
from lttoolbox.
Isn't the solution rather to compile into inconditional
those entries that begin or end with non-alphanumeric characters? Allowing analyses without intervening whitespace is the whole reason for having the inconditional/postblank/preblank feature in the first place, feels a bit redundant to in addition have special logic for entries in standard section that are not quite standard.
from lttoolbox.
Upon further investigation I think you're right, but I'm not sure how to do that efficiently. Checking whether the initial character is punctuation can almost be done while reading in the file, but I'm having trouble coming up with something better than O(|V|^2)
for checking ends.
On the other hand, maybe that's not so bad and really I should test this.
from lttoolbox.
I feel like this should also somehow be possible to solve by first reading them all into standard and then somehow splitting, or copying those paths into inconditional. (Like take the intersect with .*[[:punct:]]
and union that into incond)
from lttoolbox.
Related Issues (20)
- utfcpp HOT 1
- Python module undefined symbol
- auto-section big fst's for faster compilation HOT 2
- Duplication in generation HOT 13
- compounding on multiwords
- soft hyphens not always ignored
- configure fails because of missing utf8.h HOT 4
- postgenerator nests wordblanks HOT 1
- `<t/>` alignment in `lsx-comp`
- lsx-comp --debug HOT 2
- lt-proc loops on non-alpha followed by 2-unit codepoint HOT 2
- Post-generation problems when uppercase
- lt-trim trims valid analyses HOT 4
- Transliteration mode doubling inserted characters sometimes
- possible optional input format for lt-comp: lt-expanded dictionaries
- Expand (some) ICU character classes in regex_compiler? HOT 3
- Option to set compound_max_elements in lt-proc HOT 1
- Support for `ANY_CHAR` in regular dix files? HOT 1
- lt-proc -g -b no longer works
- lt-proc -b explodes on a-zA-Z regexes + long input HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lttoolbox.