Code Monkey home page Code Monkey logo

gramlab-ide's Introduction

Unitex/GramLab IDE Build Status

Unitex/GramLab is the open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite

GramLab is the Integrated Development Environment (IDE) of Unitex/GramLab.

GramLab IDE

How to Build

git clone https://github.com/UnitexGramLab/gramlab-ide
cd gramlab-ide
ant

How to Install and Test

To install and test the IDE you need first to download the Unitex Core executable (UnitexToolLogger). The easiest way to do this is to grab a full Unitex/GramLab release for your platform. After this you should do:

cd gramlab-ide
export UNITEX_BUILD_RELEASE_DIR=/path/to/unitexgramlab-release
ant install

Before testing,

  • make sure that UnitexToolLogger is located at /path/to/unitexgramlab-release/App/UnitexToolLogger
  • download one or several of the available languages directly from https://unitexgramlab.org/releases/latest-stable/lingua/ into /path/to/unitexgramlab-release/, as sibling directories to App.

Documentation

User's Manual (in PDF format) is available in English and French(more translations are welcome). You can view and print them with Evince, downloadable here. The latest on-line version of the User's Manual is accessible here.

Support

Support questions can be posted in the community support forum. Please feel free to submit any suggestions or requests for new features too. Some general advice about asking technical support questions can be found here.

Reporting Bugs

See the Bug Reporting Guide for information on how to report bugs.

Governance Model

Unitex/GramLab project decision-making is based on a community meritocratic process, anyone with an interest in it can join the community, contribute to the project design and participate in decisions. The Unitex/GramLab Governance Model describes how this participation takes place and how to set about earning merit within the project community.

Spelling

Unitex/GramLab is spelled with capitals "U" "G" and "L", and with everything else in lower case. Excepting the forward slash, do not put a space or any character between words. Only when the forward slash is not allowed, you can simply write “UnitexGramLab”.

It's common to refer to the Unitex/GramLab Core as "Unitex", and to the Unitex Project-oriented IDE as "GramLab". If you are mentioning the distribution suite (Core, IDE, Linguistic Resources and others bundled tools) always use "Unitex/GramLab".

Contributing

We welcome everyone to contribute to improve this project. See CONTRIBUTING.md for contribution guidelines and instructions.

License

This program is licensed under the GNU Lesser General Public License version 2.1. Contact [email protected] for further inquiries.


Copyright (C) 2021 Université Paris-Est Marne-la-Vallée

gramlab-ide's People

Contributors

aderoin avatar aleksandrachasch avatar anasaitcheikh avatar eric-laporte avatar flambertd avatar gvollant avatar hghwng avatar kalkhas avatar loxal avatar martinec avatar maxencerobin avatar mdamis avatar mthouv avatar mukarr avatar nathwhy avatar nikhilgupta23 avatar phmz avatar rahariti avatar rewopkram avatar selgueti avatar vasiljevic avatar vinber-service avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gramlab-ide's Issues

Find and replace a sequence of boxes

1. Dialog box

Add 2 tabs to the current "Find and replace" dialog box:

  • "Inside one box" (with the present "Find and replace" dialog box)
  • "Complete boxes"

The "Complete boxes" tab should look like the other but with several differences:

1.1. The "Find what" query
"Find what" and 3 lines:

  • a radio button for a choice between "One box after another" and "All boxes at once".
  • a line of instructions. The radio button switches the instructions between: "Select first box, then click "Add", etc." and "Select a sequence of connected boxes in a graph and click "Add"".
  • an editable text field, and an "Add" button on the right.

In the text field, the content of each box is displayed with a delimiter between them. The delimiter is a non-ASCII character unlikely to belong to the alphabet of a language (a triangle pointing to the right? or ◊).

Examples of "Find what" queries:

  • freedom◊of◊the◊press
  • freedom of the press
  • 고/{고,고.EV+TE+INT}◊◊요/{요,요.EV+TE+AUH}
  • 고요/{고요,고요.EV+TE+INT+AUH}

1.2. The "Replace with" pattern
Exactly the same elements repeated.

Examples of "Replace with" patterns:

  • freedom of the press
  • freedom◊of◊the◊press
  • 고요/{고요,고요.EV+TE+INT+AUH}
  • 고/{고,고.EV+TE+INT}◊◊요/{요,요.EV+TE+AUH}

1.3. "Find what" options
"Case sensitive" : checked by default.
No "Match only a whole line" option (meaningless).
In a future version, "Use regular expressions" for the research might be useful. In that case, \1 \2 \3 etc. should be available in the substitution query.

1.4. Other buttons: like in the present "find and replace" box.

2. Functionality

2.1. The number of occurrences found is displayed in the status bar field.

2.2. "Find next" and "Find previous" find and select an occurrence of the "find what" query.
If a transition enters into a box which is not the first, and if a transition goes out of a box which is not the last:

  • select the occurrence,
  • issue a warning "Part of this occurrence is shared with another path" in the status text field
  • disable and gray the "Replace" button (this is the only case when it is grayed).

2.3. "Replace" and "Replace all" replace the occurrences of the "find what" query by the "replace with" pattern, except if they have an ingoing transition into a box which is not the first, or an outgoing transition from a box which is not the last.

Integrate the Classic IDE and the Project-oriented IDE

Unitex/GramLab include two Java IDEs, the Classic IDE (Unitex.jar) and the Project-oriented IDE (Gramlab.jar). The aim of this issue is to integrate both IDEs into a new one, which will be named simply GramLab, featuring two perspectives: Classical and Project-oriented.

A plugin-based approach could be used to accomplish the integration of the IDEs functionalities. In this regard, a branch feature/plugins has been created to feature an experimental mechanism to extend and enhancing the IDE functionalities via plugins.

Plugins are build on PF4J, an open source and lightweight plugin framework for Java, with minimal dependencies and easily extensible. Plugins are distributed in ZIP files which bundled all runtime dependences, and that can be installed without difficulty copying them into the App/plugins folder.

A few examples of some core plugins that would be necessary to create are:

  • concordance-viewer as illustrated in the User's Manual, Fig. 4.8.
  • dictionary-viewer as illustrated in the User's Manual, Fig. 3.2.
  • file-editor as explained in the User's Manual, Section 2.3.
  • git-connector to work with projects stored in Subversion repositories.
  • graph-editor as showed in the User's Manual, Section 5.2.
  • graph-exporter as described in the User's Manual, Section 5.4.
  • plugin-manager to provide a use interface to manage plugins.
  • svn-connector to work with projects stored in Subversion repositories.
  • transcoder as illustrated in the User's Manual, Fig. 2.3.

In the same way, other plugins could be created to extend the IDE functionalities:

  • charmap a lookup table including character codes in a given encoding.
  • gate-connector a file format Import/Export between Unitex and Gate.
  • treecloud features TreeCloud visualizations of Unitex concordances.
  • plugin-updater plugins update mechanism.
  • xalign: as described in the User's Manual, Chapter 10.

To start developing a new plugin the GramLab Skeleton plugin will give a starting point.

Display "0 match" message upon empty-grammar query to Locate Pattern

When Locate Pattern does not find matches to a query, it correctly produces a "0 match" message, except if the query is an empty grammar.

What steps will reproduce the problem?

Open a text and submit the attached cul-de-sac.grf grammar to Locate Pattern. The Chiffres.grf subgraph must be in the same directory as cul-de-sac.grf. This grammar has no complete path from the initial box to the final box.

What is the expected output?

The 'Result info' dialog box should notify '0 match'.

What do you see instead?

The 'Working...' window issues warnings and display 'ERROR' as a title. The 'Result info' dialog box notifies 'null null null' or the results of an earlier query, if any. If we clilck OK, the 'Located Sequences...' dialog box offers to generate a concordance.

More info

A query to Locate Pattern with an empty grammar is a valid use case. Some grammar authors invalidate paths by deleting one of the transitions in the path, and the 'Lexicon-Grammar' menu does that systematically when a path is associated to a construction which is invalid for a lexical entry. Therefore, an automatically generated grammar may happen to have all its paths interrupted by the absence of a transition. Such a grammar is empty, and submitting it to Locate Pattern should produce the same behaviour as any grammar for which no match is found.

cul-de-sac.zip

Getting more done in GitHub with ZenHub

Hola! @Mukarr has created a ZenHub account for the UnitexGramLab organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.


How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

  • Real-time, customizable task boards for GitHub issues;
  • Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
  • Personal to-do lists and task prioritization;
  • Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @Mukarr.

ZenHub Board

Save Snt As

It would be useful to be able to save the current Snt and its folder.

What is the expected output?
Save the current .snt file and the corresponding _snt directory with its contents. It should be allowed to save them in another directory than Corpus. Give all files the date of the copy. The new snt is set as the current corpus.

Statistics on number or words are not updated after new dictionary lookup

What steps will reproduce the problem?

  1. Launch Unitex in French and open Le Tour du monde en 80 jours, accepting the default preprocessing. A 80jours.snt window appears and displays, under the title bar, the number of simple words and compound words found (several thousands).
  2. Click Text > Apply lexical resources, choose only the ajouts80jours.bin dictionary and click Apply.

What is the expected output?

During step 2, the new dictionary lookup overwrites the result of the initial dictionary lookup. After step 2, the 80jours.snt window should logically display the new number of simple words and compound words. Since ajouts80jours is a very small dictionary, these numbers should be much smaller (less than 100).

What do you see instead?

The 80jours.snt window is unchanged.

  • Unitex/GramLab IDE version: 3.2.56-alpha
  • UnitexToolLogger version: 3.2.57-alpha
  • Did this work before?: [ ] yes [ x ] no

Lookup not operational on user dictionaries

Submitted by @eric-laporte

The lookup functionality (Unitex menu DELA > Lookup...) is operational on system dictionaries, but not on user's dictionaries.

What steps will reproduce the problem?

Copy the same dictionary inside the system Unitex directory and inside the user's Unitex dictionary

What is the expected output?

The entries found in the system dictionary are also available in the user's dictionary.

What do you see instead?

The entries found in the system dictionary are not found in the user's dictionary.

Resizing the window that displays the text automaton

The window displaying the text automaton (user manual, Section 7.1) can be resized with the mouse by dragging edges or corners, including the bottom edge. However, if the zone for the Elag output is minimized (as is the case on Figure 7.1, when you open the window), only the lower half of the bottom edge can be used to resize the window. The upper half of the bottom edge works differently. This difference is unexpected, since the edge is so thin.

What is the expected output?

By clicking and dragging the edge of the window, the user expects to resize the window. Especially if the double arrow appears when they hover the mouse over the edge.

What do you see instead?

If the zone for the Elag output is minimized (as when you open the window), and if you hover the mouse over the upper half of the bottom edge, the double arrow appears, but then if you click, you can't resize the window.

  • if you grab and drag the edge up, the zone for Elag output appears (this is not likely to be what the user wanted; and there is another way to have this zone appear: the disclosure widget on the left of the bottom edge).
  • if you try to grab and drag the edge down, nothing happens.

More info

The zone for the Elag output is what appears at the bottom of Figure 7.13 of the user manual.
The left-bottom and right-bottom corners work as expected.
I experienced the problem on Windows (now Windows 10).

Colour marking in the text automaton

Some visual marking of unambiguously tagged parts and of untagged parts in the text automaton might be useful to users of the text automaton, especially to linguists that revise the tagging, a very repetitive task. The HUFS University in Korea is testing various forms of colour marking. Here are their best choices. The best colour marking might be extended to all languages, if users and developers agree.

  1. Boxes for untagged tokens: lavender blue (CCCCFF) background inside.
  2. Boxes for unambiguously tagged items: periwinkle blue (9999FF) or light green (CCEB94) background inside; lines as thick as normal boxes; no bold characters.
  3. Entirely unambiguously tagged sentences: periwinkle blue (9999FF) or cornflower blue (99C9E4) background in boxes (maybe except for untagged tokens); transitions either periwinkle blue (9999FF) or the same colour as other sentences, not yellow.

SET (line Merge) in Preprocessing window

When one clicks on the "Set" button of the line "Replace" one go to the directory "Graphs/Preprocessing/Replace", but when one clicks on the "Set" button of the line "Merge" one also go to the directory "Graphs/Preprocessing/Replace" and not go to the directory "Graphs/Preprocessing/Sentence".
I suggest to correct this. Or for the two button I suggest to go to the directory "Graphs/Preprocessing"...

Add a Copy button on error message dialogs

Submitted by Denis Maurel.

Add a Copy to the clipboard button on error message dialogs. This include:

  • Error messages as result of a Locate Pattern task as illustrated on the User's Manual Fig. 6.60
  • Error messages as result of a Java Exception as showed on #10

Add LeXimir to the Unitex/GramLab IDE

LeXimir is a module of dictionary management

This dictionary management module enables concurrent manipulation of a set of dictionaries of lemmas, simple words or compounds, distributed in several files. This module enables the user to modify or delete all the information attached to a lemma, or the lemma itself, as well as to add new entries.

An important feature of this module is the ability of retrieving efficiently a subset of lemmas by
matching the lemmas, their part of speech (PoS), inflectional class code, syntactic and semantic
markers or their comments. For instance, one can look for all the dictionary entries starting
or ending with a search string.

Enhance Print function for graphs

The FSGraph > Print menu opens a dialog box which allows the user to print a graph.
As it stands, this function prints a small graph on the top of a (portrait-oriented) standard A4 page.
It would be useful to allow the user to control the size of the graph and the orientation of the page.

Rename Reset Sentence Graph and Rebuild FST-Text

According to the manual Reset Sentence Graph discards the manually modified automaton of the sentence and resets it from the global text automaton. For clarity it should be named Revert to last save.

Similarly, with Rebuild FST-Text, all sentence automata that have been modified are then replaced by their modified versions in the global text automaton. The new text automaton is then automatically displayed. The button should be named Save.

Save As/Open concordance

Currently, only a single concordance can be opened and saved at once.
This will allow the users to compare several concordances easily.

What is the expected output?
Save the current concordance. It should be allowed to save them in another directory than Corpus. Opening an already open concordance will bring it in the foreground.

Apply Lexical Resources Buffer Underflow Exception

In the "Apply Lexical Resources" frame, it is possible to get some meta-information about a dictionary with a right click. If exists, this information is read from a foo.txt file which is related to a foo.bin or foo.fst2 dictionary. (see User's Manual section 14.8.3)

What steps will reproduce the problem?

  1. Launch Unitex
  2. Select French as working language
  3. Text > Open > 81jours.txt
  4. Do you want to process the text ? Answer No
  5. Text > Apply Lexical Resources
  6. Right click on NPr+.fst2
  7. Right click on Elements.fst2

What is the expected output?

The bottom panel loads the content from French\Dela\Elements.txt

What do you see instead?

A Buffer Underflow Exception. See the screenshot below.

ide-bug

More info

Unitex-GramLab IDE 3.1.4314

Fix FST-Text>frame below>Table: Export DELAF style

Add radio buttons for fields to export:

  1. Add: RB Surface form (first field)
  2. Add: RB Segmentation only Ex: AlkitaAbu = {Al}{kitaAbu}
  3. Add: RB POS Ex: _AlkitaAbu = {Al,.DET}{kitaAbu,.N}
  4. Add: RB Lemma Ex: _AlkitaAbu = {Al,.DET}{kitaAbu,kitaAb.N}
  5. Add: RB Inflection feats: Ex: _AlkitaAbu = {Al,.DET}{kitaAbu,kitaAb.N:msDN}
  6. Add: RB Semantic features Ex: _AlkitaAbu = {Al,.DET}{kitaAbu,kitaAb.N+concrete:msDN}

For ambiguity, we keep the same format :
_AlkitaAb = ({Al,.DET}{kitaAbu,kitaAb.N+concrete:msDN} |
{Al,.DET}{kitaAbu,kitaAb.N+concrete:msDA} |
{Al,.DET}{kitaAbu,kitaAb.N+concrete:msDG}
)

Locate pattern option recognizes incorrect paths, which are not recognized in Debug mode

Locate pattern option recognizes incorrect paths, which are not recognized in Debug mode. These paths include test variables $abc.EQUAL=#XYZ$. or $abc.UNSET$.

What steps will reproduce the problem?

  1. Launch Unitex for Portuguese Brazil
  2. Open test_nbr_portug.txt attached
  3. Text > Locate pattern...
  4. Set > Dnum-simpl.grf (keeping all default settings)
  5. Locate > Build concordance (keeping all default settings)

Attachments
locate_pattern_issue_attachmnt.zip

What is the expected output?

The following paths should not appear in Locate pattern results :

  • duas mil cento e dois
  • duzentas mil e duzentos

Debug mode does ignore these paths with .grf file (see capture below)

debug_mode

What do you see instead?

These incorrect paths do appear in Locate Pattern results. (see capture below)

locate_pattern_test

More info

  • Unitex/GramLab IDE version: 3.2.56-alpha
  • UnitexToolLogger version: 3.2.57-alpha

Undo button cancels any action made after defining an output variable

When we define an output variable, Undo button starts cancelling any action made starting from that point, instead of cancelling only one action at a time.

What steps will reproduce the problem?

  1. Launch Unitex with any language
  2. Create a new graph with FSGraph > New
  3. Create an empty box using the "Create new box" button
  4. Link boxes from starting node to your box and from your box to end node using "Normal editing mode"
  5. Select your new box
  6. Surround it with an output variable by clicking the "Surround box selection with an output variable" button, choose any name for your variable
  7. Create a new empty box using the "Create new box" button
  8. Click the "Undo" button

What is the expected output?

Undo button should cancel only the last action (Create new box).

What do you see instead?

Undo button cancels all the actions made after clicking "Surround box selection with an output variable" button, including "Surround box selection with an output variable" itself.

More info

  • Unitex/GramLab IDE version: 3.2.56-alpha
  • UnitexToolLogger version: 3.2.57-alpha

Find and replace for graphs

Currently, there is no way to find and replace the content of one or more boxes at the same time.
I already have implemented this feature please see my pull request #23.
The feature can be used by either clicking on the magnifying glass in the graph toolbar or in the FSGraph menu.
To avoid any issue, you cannot replace the content of empty boxes (containing only ) and standalone boxes.

Open graph-files passed on the command-line

The purpose of this feature request is to support to open a list of graph-files (.grf) passed on the command-line. e.g.

$ java -jar Unitex.jar -Dunitex.binary.dir="/home/user/Unitex-GramLab-3.1/App" foo.grf bar.grf

Currently

  • The IDE only accepts 1 optional argument passed from the command-line, this argument is the path where the UnitexToolLogger binary is located.

Requested changes

  • Add an optional parameter, namely, unitex.binary.dir to setup the path where the UnitexToolLogger binary is located.
  • Allow running only one instance of the IDE at a time.
  • Treat remaining arguments as graph-filenames (See requirements below).

Requirements

For one ore more valid graph-files (*.grf) passed on the command-line:

  • If the file exists, directly open it in the graph editor (User's Manual Section 5.2)
  • If the file doesn't exists, try to create a new graph with the same name and then open it in the graph editor.
  • If the file is already open, bring the focus back to the graph's window, i.e. only one open instance of the same file is allowed.

Further Plans

If this feature request is implemented, it would be possible to extend the Unitex/GramLab installers to configure an OS, such as Windows or Linux, to open a graph-file on double-click. This is a common feature of user-friendly editors.

Failure of 'Export all text as POS list'

The Table pane of the 'FST-text' window contains an 'Export all text as POS list' button that converts the text automaton into a 'POS list' format. This function fails on some corpora.

What steps will reproduce the problem?

Open the attached Korean text and click the 'Export all text as POS list' button.
1-Creative writing-NF-25000_snt.zip

What is the expected output?

The progress bar reaches the right end.

What do you see instead?

The progress bar stops.

Specialized "Open..." options for specific file types in "File Edition" menu

Submitted by @vasiljevic

Different file types (like .TXT, .DIC, etc) may have different default folder and default extension. Specialized "Open..." options in "File Edition" menu will offer appropriate default folder and default file
extension in file open dialog. This small change may be useful for some users, as Denis Maurel noted in his post: https://groups.google.com/forum/#!searchin/unitex-gramlab/edition/unitex-gramlab/iRTR75xD21Q/16Mvt8SHGkEJ

Bug in some FST-Text when we apply graph on text

We notice a problem, in some graph, we don't have all path when we apply the graph to a text.
The automaton is not exact.

What steps will reproduce the problem?
Open the attached graph "deux_centieme.grf", compil it, apply to the text "deux_centieme.txt", and Construct FST-Text.

  1. deux_centieme.tar.gz

What is the expected output?

We want all paths, the path for "deux-centième" and the path for "centième".

What do you see instead?

We have the path "deux centième " but we don't have the path "centième" on the automaton.

Undo/Redo for FST-Text

Currently we can modify the graph in FST-Text but the undo/redo functionality is not implemented.

It should be implemented like the undo/redo in FSGraph.

Output of Fst2List not displayed

Submitted by @eric-laporte

The results of Fst2List (list the paths of a graph, with a limit on the number of paths, cf. manual section 6.5) are correctly saved in a file list.txt, but not displayed in the dialog box when Fst2List issues warnings such as 'the graph 12 recognizes '.

What steps will reproduce the problem?

In the experiment recorded in the attached log, I launch Fst2List through the Unitex menu item 'FSGraph > Tools > Explore graph paths' with the default parameters.

What is the expected output?

The 'Explore graph paths' dialog box notifies that the results have been saved and display the results.

What do you see instead?

The 'Working...' window issues warnings. The 'Explore graph paths' dialog box does not notify that the results have been saved and does not display the results. The results are correctly saved in a file list.txt.

The 'Explore graph paths' dialog box is on the foreground and no other Unitex window can come to the foreground, so I close the dialog box. Then I launch Fst2List again in the same way. This time, the title of the dialog box is '100 lines' and it displays the results of the preceding request.

More info

This behaviour was already present in Unitex3.0, but it is confusing for the user and it complicates the task of analysing what works and what does not work in Fst2List.

Quoting Sebastien Paumier today about this problem: "la solution stable serait d'ajouter un paramètre booléen pour indiquer que l'object ToDo passé à Launcher.exec() doit être exécuté sans attendre que la fenêtre d'exécution des commandes soit fermée. Ce n'est pas compliqué, mais il faut prendre le temps de bien tester pour vérifier qu'on n'introduit pas de comportement indésirable."

Add box in FST-Text

Adding new boxes to the FST-Text would allow the user to split a token. For example, it is especially useful when analyzing social media such as tweets where people will remove the space character between words.

The new box must respect the bounds described in the manual (See 14.5 Text Automaton p. 315). Also if the text automaton is valid, the other boxes' bounds should be updated if needed.

Exception in FST-Text

In FST-Text an exception will appear if the user delete a transition and then try to revert to the last save.

What steps will reproduce the problem?

  1. Open 80jours.txt & preprocess the text
  2. Construct FST-Text
  3. Go to sentence 2
  4. Delete a transition
  5. Go to sentence 3
  6. Go back to sentence 2
  7. Click on Reset Sentence Graph or Revert To Last Save on 3.2alpha
  8. The exception will appear

What is the expected output?

Revert To Last Save correctly load the previous save.

What do you see instead?

exception

More info

  • Unitex/GramLab version: 3.1 and 3.2alpha
  • Did this work before?: [ ] yes [x] no

Fst2List option is unclear

Fst2List recursive exploration option description makes the behaviour of each choice unclear.

  • "Only paths" doesn't imply clearly the recursive exploration.
  • "Do not explore subgraphs recursively" doesn't explain what is the expected output.

issue-100

Detection of caller graphs is slow on first use

In the toolbar of the FSGraph window, there is an icon (with three arrows pointing to a rectangle) to list all the graphs that call the graph displayed in the window. On the first use of this button in a given language, the detection takes more time to complete. From the second use in the same language, the duration of the detection is short and stable, even on another graph or after closing and relaunching Unitex.

What steps will reproduce the problem?

  1. Open a language where the icon for detection of caller graphs has never been used
  2. Open a graph
  3. Click on the icon
  4. Open another graph
  5. Click on the icon

What is the expected output?

The detection should take roughly the same time to complete in steps 3 and 5.

What do you see instead?

The detection takes more time in step 3 than in step 5. The difference is significant if there are many files in the directory and subdirectories of the language in my workspace. In Korean (4310 files), it takes 45 s on the first time and 5 s on the second time. In Arabic (6884 files), it takes 31 s on the first time and 6 s on the second time. In Modern Greek (59 files), it takes less than 1 s on both times.

It is usual to have thousands of graphs in languages with complex morphology. Users may conclude that the button doesn't work on the day they discover and try it.

More info

  • Unitex/GramLab IDE version: 3.2.56-alpha
  • UnitexToolLogger version: 3.2.57-alpha
  • Did this work before?: the symptom is the same with version 3.1.4314.

Error when compiling graphs in cascade edition interface

In the cascades edition interface, when trying to compile all the graphs of the currently selected cascade, an error occur if one of the graphs has a reference to a sub-graph located in the defined "graph repository" (box content like "::subGraph").

What steps will reproduce the problem?

1. Create a sub graph placed in the "graph repository" (the graph repository needs to be setted if it's not already done, in info > Preferences > Directories).
2. Create a graph that contains a reference to the previously created sub graph (for example with a box like "::subGraph").
3. Create a new cascade, add the lately created graph containing the sub graph.
4. Click on the "compile" button of the cascade edition interface, which launch the compilation of all the graphs in the cascade.

What is the expected output?

A normal compilation process

What do you see instead?

A compilation process resulting in an error due to the missing path to the "graph repository"

More info

This error occur because the call in the cascade interface to Grf2Fst2 is done without the "-d" parameter, that allows the utilisation of the "graph repository". The call to Gr2Fst2 in the graph edition interface is done correctly.

Behaviour of caller graph button depends on path of graph

Hi,

The behaviour of the "caller graphs" button of the FSGraph editor unexpectedly depends on the path of the graph.
The "called graphs" button does not show the same dependency.

What steps will reproduce the problem?

  1. Copy the attached directory (test) out of your Unitex workspace, for instance in the parent directory of your Unitex workspace
    test.zip
  2. Open test/Dnum12.grf with the FSGraph menu
  3. Click the "Graphs that call this one" button

What is the expected output?

The filename "Dnum.grf" appears in the "Caller graphs" list.

What do you see instead?

The "Caller graphs" list remains empty.

More info

  • If you open text/Dnum.grf and click on the "Subgraphs called from this one" button, the filename "Dnum12.grf" appears in the "Callee graphs" list, as expected.

  • If you copy the test directory in French/Graphs in your Unitex workspace, open Dnum12.grf and click on the "Graphs that call this one" button, the filename "Dnum.grf" appears in the "Caller graphs" list, as expected.

  • Unitex/GramLab version: 3.2beta dated May 5, 2017

  • OS: Windows 10

  • Did this work before?: [ ] yes [x] no (same problem with Unitex/GramLab 3.1)

Thanks!
Eric

Inaccessible frame when using 'explore graph paths' tool

The error frame can't be accessed when there is an error using the explore graph paths tool.

What steps will reproduce the problem?

  1. Open a graph which is found to be recursive (Dnum.grf in dnum_graph.zip)
  2. Run the Explore Graph Paths tool (with default options) on the graph

What is the expected output?

The error frame called Done should be accessible to be closed before running the tool with different options for exemple.

What do you see instead?

explore-graph-issue

The error frame can't be accessed as long as the explore graph paths window is opened.

More info

This behaviour is caused by the way each frame is implemented in Gramlab. Explore Graph Paths (GraphPathDialog.java) is a JDialog whereas the error frame (ProcessInfoFrame.java) is a JInternalFrame. A JDialog always has higher priority than a JInternalFrame.

This behaviour has been mentioned in [issue #7] but wasn't addressed.

  • Unitex/GramLab IDE version: 3.2.59-alpha
  • UnitexToolLogger version: 3.2.59-alpha

Change format of word tags in text automaton

The text automaton contains word tags generated from lexical entries in dictionaries. A word tag contains 3 fields: inflected form, lemma and codes. Once the text automaton has been generated, the user may wish to modify the format of the word tags in two ways:

  • by deleting one or two of the 3 fields,
  • by performing substitutions in the codes according to a specification described in a text file called a glossary and prepared by the user. In a glossary, each line specifies a substitution and contains a regular expression, a slash (/) and a replacement sequence. The regular expression specifies which sequences will be found and replaced. For example, if the glossary contains the following line:
    NS.*/NS
    the word tag {nalssi,nalssi.NS+ZNZ+JN#JN08} will become {nalssi, nalssi.NS}.

Link to Chatroom

Hello @martinec .
I have applied as a student intern for GSoC under Unitex/Gramlab.
I am developing small prototypes for the projects I have applied for to show my skills. I am requesting for a link to a chatroom where I can communicate with the mentors easily about my interest for Gramlab and for developing myself initially to be a good candidate to work under Unitex/Gramlab.

Thanks.

Add a Open Recent submenu item to DELA menu

Currently, the FSGraph menu features an Open Recent submenu item including the recent graphs that were opened. However, the DELA menu (as well as the Dictionaries menu on Gramlab.jar) doesn't has a similar feature to show a list including the recent dictionaries.

Exception in FST-Text

What steps will reproduce the problem?
Create a cycle in the graph by adding a transition.

What is the expected output?
An attempt to create a transition that completes a cycle in the FST-Text should have no effect.

exception

Copy the list of called sub(sub)*-graphs to the clipboard

Submitted by @alexis-neme

In order to perform further operations, it is interesting for the user to get the list of the called sub(sub)*-graphs in a Notepad.

However, the user cannot COPY and PASTE the content (or the list of sub-graphs) of the listBox.

Reference:
Unitex Manual3.1, Pages 96, Fig. 5.15.

Active the creation of log files from the project-oriented IDE

The Unitex Classic IDE (Unitex.jar) features an option to create command execution log files (.ulp). The following UI actions are required to active this:

  1. Launch the Classic IDE
  2. Menu Info > Preferences > Directories
  3. Check Produce log information in directory
  4. Set an output directory, e.g. /foo
  5. Click OK

After being activated, the log files will be stored in the directory /foo. see the User's Manual section 13.1 for more details.

The requested feature is to be able to active the creation of log files from the project-oriented IDE (Gramlab.jar) as per project basis. The UI actions to activate this option should be:

  1. Launch the project-oriented IDE
  2. Menu Project > Preferences > Directories
  3. Check Produce log information in directory
  4. Set an output directory, e.g. /foo
  5. Click OK

After being activated, the log files will be stored in the directory /foo.

Opening an open dictionary should make it active

"DELA > Open" is used to open a dictionary. If the dictionary is already open, its window should at least become active.

What steps will reproduce the problem?

  1. Open a dictionary A.dic with "DELA > Open"
  2. Open a dictionary B.dic and make sure its window hides A.dic
  3. Open A.dic

What is the expected output?

A.dic becomes visible and active.

What do you see instead?

A.dic remains hidden.

More info

  • Unitex/GramLab IDE version: 3.2.59-alpha
  • UnitexToolLogger version: 3.2.59-alpha
  • Did this work before?: [ ] yes [ x ] no
    If you do the same thing with "FSGraph > Open" and graphs A.grf and B.grf, A.grf becomes visible and active at step 3.

Suggestion to solve the ambiguous corpus problem in Unitex

In a corpus with multi-candidate tagging, each sentence has several solutions, but we don't have within Unitex/GramLab rules that always allow to find the best solution, so here is a suggestion to solve this problem : using the Unitex text automaton and Gate machine learning to score the candidates and choose the best score among them.
Gate machine learning consists of two basic operations: first, doing the training on a unambiguously tagged corpus to generate a trained model, and second, applying the model on the multi-candidate corpus.

Convert the IDE's internal file editor into a plugin

The internal file editor's functionality needs to be implemented through a plugin. Apart from editing, search and replace functionalities are also needed as part of the core-plugin. [User Manual Section 2.3]

Let user set name of output file containing list of paths of graph

In the present 'FSGraph > Tools > Explore graph paths' function, the output (which is the list of the paths of a graph) is automatically saved in list.txt in the directory of the language.
The user should be allowed to choose and set the name and directory of the output file.

Allow to create a non-existent sub-graph

Submitted by Daniel Stein.

When you create a box with a sub graph that does not yet exist, it turns red [1]. When you click on this red box with the "Open Sub Graph" pointer a dialog window states "Cannot find xyz". I want to suggest, that the dialog should also offer a "Create" button which creates a new graph with the respective name and opens it

Original User's Forum thread

[1] Missing called sub-graphs appear in red as illustrated on the User's Manual, Fig. 5.14.

Add a function to convert a POS-list table into a text automaton

Unitex/GramLab has a function to convert the text automaton into a 'POS list' table (FST-Text dialog box -> Table tab -> Export all text as POS list button). (See Section 7.8 Table display, User's Manual p.193-194)

This feature request consists in implementing the reverse conversion in case all lexical ambiguity has been removed from the 'POS list' table, i.e. a conversion from a POS-list table (plain-text format) into a text automaton (FST-Text format).

This function was suggested by users that remove lexical ambiguity from corpora manually, but do part of this revision on the 'POS list' format. They want to be able to convert the resulting corpus back to the FST-Text format, so that they can apply search queries on it later.

See also PRJ-10 : http://unitexgramlab.org/student-project-proposals

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.