Code Monkey home page Code Monkey logo

Comments (9)

unhammer avatar unhammer commented on July 30, 2024

Just in case: could it be #9 ? (Does the path for c.1 start with epsilons on the input side?)

from lttoolbox.

jonorthwash avatar jonorthwash commented on July 30, 2024

Just in case: could it be #9 ? (Does the path for c.1 start with epsilons on the input side?)

I mean, yes, but so do all other paths, in this transducer:

0       1       ε       ε       0.000000        
1       13681   с       с       0.000000        
13681   723     .       .       0.000000        
723     7       ε       <abbr>  0.000000        
7       8       ε       ε       0.000000        
8       0.000000

Cf.

0       1       ε       ε       0.000000        
1       10      .       .       0.000000        
10      3       ε       <sent>  0.000000        
3       4       ε       ε       0.000000        
4       0.000000

But the latter path is in a separate section of the transducer, separated by -- in lt-print output (the former path is below the --, with most other things, and the latter is above, with only a few other things). This makes me think that @ftyers's hypothesis that it has to do with inconditional/standard section status might be right:

(00:44:56) spectie: it might expect that string to be in an inconditional section
(00:45:06) spectie: (there are different behaviours of the different sections)
(00:45:16) spectie: but the AttCompiler probably puts it in the standard section

from lttoolbox.

unhammer avatar unhammer commented on July 30, 2024

Hm, I think #9 might be about initial epsilons on input-side only (ie. not aligned, as in ε c and then c ε or something).

It's correct that lt-proc would need the path for c. to be in an inconditional section in order to appear immediately before other standard analyses. I guess the fix is that lt-comp on att files should put things ending in periods/punctuation in inconditional? That would also allow things like croc. tokenised as ^cro$^c.$ (avoid that by making sure the dictionary also has croc as one entry).

Is analysis of 1 in the standard section btw? (If it is in inconditional, the hypothesis is wrong – you can have a standard analysis immediately followed by inconditional.)

from lttoolbox.

jonorthwash avatar jonorthwash commented on July 30, 2024

Is analysis of 1 in the standard section btw? (If it is in inconditional, the hypothesis is wrong – you can have a standard analysis immediately followed by inconditional.)

Most of what's above the -- appear to be number-loop-related things, but I can't find any paths that are the analysis of just 1, whereas the part below -- does include the analysis of 1. I assume the part below -- is standard and not inconditional?

from lttoolbox.

AMR-KELEG avatar AMR-KELEG commented on July 30, 2024

I believe the attcompiler's classify function needs some refactoring/ bug-fixes:
https://github.com/apertium/lttoolbox/blob/master/lttoolbox/att_compiler.cc#L375

I am not sure I can work on it given my GSoC project.
The fix shouldn't be that hard but I need to discuss it with my mentors.

from lttoolbox.

mr-martian avatar mr-martian commented on July 30, 2024

Paths in the FST are classified based on the first non-tag non-epsilon symbol on the input side.

$ printf 'PATTERNS\n[c.]\n[1]\n' | lexd
0	1	c	c	0.000000	
0	2	1	1	0.000000	
1	2	.	.	0.000000	
2	0.000000
$ printf 'PATTERNS\n[c.]\n[1]\n' | lexd > blah.att
$ lt-comp lr blah.att blah.bin
main@standard 3 3
$ echo 'c. 1 c.1' | lt-proc blah.bin 
^c./c.$ ^1/1$ ^c/*c$.^1/1$

In this case, both c and 1 are alphanumeric, so they both go into the standard section type.

I think maybe the solution here is to allow two standard entries without intervening whitespace if they begin or end with non-alphanumeric characters.

from lttoolbox.

unhammer avatar unhammer commented on July 30, 2024

Isn't the solution rather to compile into inconditional those entries that begin or end with non-alphanumeric characters? Allowing analyses without intervening whitespace is the whole reason for having the inconditional/postblank/preblank feature in the first place, feels a bit redundant to in addition have special logic for entries in standard section that are not quite standard.

from lttoolbox.

mr-martian avatar mr-martian commented on July 30, 2024

Upon further investigation I think you're right, but I'm not sure how to do that efficiently. Checking whether the initial character is punctuation can almost be done while reading in the file, but I'm having trouble coming up with something better than O(|V|^2) for checking ends.

On the other hand, maybe that's not so bad and really I should test this.

from lttoolbox.

unhammer avatar unhammer commented on July 30, 2024

I feel like this should also somehow be possible to solve by first reading them all into standard and then somehow splitting, or copying those paths into inconditional. (Like take the intersect with .*[[:punct:]] and union that into incond)

from lttoolbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.