garrafao / wugs Goto Github PK
View Code? Open in Web Editor NEWCode to process Word Usage Graphs
License: GNU General Public License v3.0
Code to process Word Usage Graphs
License: GNU General Public License v3.0
Now, the Stats, Grouping stats, Agreement stats works on the server and locally. But Plotting stats and Annotator data does not work locally, also, the info and annotator filter do not work.
To reproduce, please run:
scripts/run_system2.sh test_uug/ correlation spring
Quotes are not needed, it looks like the context, must be quoted, but it shouldn't.
add Russian and Chinese data to durel_system/upload_formats/
check these two lines in the notebook:
judgments_wug.to_csv('judgements_all.csv', index=False)
uses_wug.to_csv('uses_all.csv', index=False)
you should fix this because they will introduce additional quotation mark in the output
What are the separate sub-processes of the pipeline? Which values are calculated in these sub-processes? What is the format of the output values?
Which parameters can be passed to the pipeline to filter / specify the process? In which format should the parameters be presented? For which sub-processes of the pipeline are the parameters needed?
the stats_agreement.json is not generated when viewing graph in the discowug_unc dataset, and it leads to all statistics not rendering
for example, the test_english project does not have instance files
and add sample HTML with a description for how to create samples for integration of certain language
the data contained inside these files are not Json but javascript code. Here you are creating a global variable with standard javascript grammar, also, in the code example like:
<script type="text/javascript" src="../stats.json"> according the type, the src should be text or javascript. Please rename all the files end with json suffix to js suffix.the filtering in the output HTML plots is not very efficient according to Lukas.
if there is no grouping split in the usage data, no stats_groupings file is created and thus no information on cluster frequency distributions is exported.
Please try to upload both the uses file and the instances file into the system and make sure they can be uploaded. Currently, both two files have problems.
Currently, the pipeline interprets judgments of 0.0 as non-valid. For label ranges including 0 as valid label, such as cosine similarity ranging from -1 to 1, this leads to wrongly treating 0.0 as non-valid label. Thus, we need to make the non-valid label a parameter.
try to figure out how to incorporate the new mechanism inside lucas's template into the template for spring
It would be helpful to have timestamps for the errors that occur in the WUG pipeline to improve the traceability of these errors when the pipeline is executed multiple times.
When running the pipeline in an environment with python 3.11 the following error occurs:
Traceback (most recent call last):
File "/home/arbeit/Desktop/DURel/durel_system/WUGs/scripts/data2join.py", line 46, in <module>
w = csv.DictWriter(f, ['identifier1', 'identifier2', 'judgment', 'comment', 'annotator', 'lemma'], delimiter='\t', quoting = csv.QUOTE_NONE, quotechar='')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/csv.py", line 139, in __init__
self.writer = writer(f, dialect, *args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: "quotechar" must be a 1-character string
This error (and comparable ones) occur in three scripts: data2join, data2agr and data2annotator. I was able to 'solve' it by removing quotechar=0
in all three files. It should be tested whether this change breaks anything in the pipeline.
There are too many and they make it hard to find important output.
conda version 22.11.1
python version 3.9.13.final.0
When pip installing the requirements.txt from inside a conda environment, an error can occur with pygraphviz. Installing pygraphviz directly though conda works conda install --channel conda-forge pygraphviz
. The error could occur because pygraphviz does not find graphviz when pip installed inside conda.
Collecting pygraphviz
Using cached pygraphviz-1.10.zip (120 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: pygraphviz
Building wheel for pygraphviz (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [55 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-39
creating build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/agraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/testing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/graphviz.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
creating build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_unicode.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_attribute_defaults.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_string.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_node_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_subgraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_html.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_repr_mimebundle.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_clear.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_drawing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_edge_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_graph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_layout.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_close.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_readwrite.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
running egg_info
writing pygraphviz.egg-info/PKG-INFO
writing dependency_links to pygraphviz.egg-info/dependency_links.txt
writing top-level names to pygraphviz.egg-info/top_level.txt
reading manifest file 'pygraphviz.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
adding license file 'LICENSE'
writing manifest file 'pygraphviz.egg-info/SOURCES.txt'
copying pygraphviz/graphviz.i -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/graphviz_wrap.c -> build/lib.linux-x86_64-cpython-39/pygraphviz
running build_ext
building 'pygraphviz._graphviz' extension
creating build/temp.linux-x86_64-cpython-39
creating build/temp.linux-x86_64-cpython-39/pygraphviz
gcc -pthread -B /home/line/anaconda3/envs/DURel/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -I/home/line/anaconda3/envs/DURel/include -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -fPIC -DSWIG_PYTHON_STRICT_BYTE_CHAR -I/home/line/anaconda3/envs/DURel/include/python3.9 -c pygraphviz/graphviz_wrap.c -o build/temp.linux-x86_64-cpython-39/pygraphviz/graphviz_wrap.o
pygraphviz/graphviz_wrap.c:2711:10: fatal error: graphviz/cgraph.h: No such file or directory
2711 | #include "graphviz/cgraph.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pygraphviz
Running setup.py clean for pygraphviz
Failed to build pygraphviz
Installing collected packages: pygraphviz
Running setup.py install for pygraphviz ... error
error: subprocess-exited-with-error
× Running setup.py install for pygraphviz did not run successfully.
│ exit code: 1
╰─> [57 lines of output]
running install
/home/line/anaconda3/envs/DURel/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-39
creating build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/agraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/testing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/graphviz.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
creating build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_unicode.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_attribute_defaults.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_string.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_node_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_subgraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_html.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_repr_mimebundle.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_clear.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_drawing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_edge_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_graph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_layout.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_close.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
copying pygraphviz/tests/test_readwrite.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
running egg_info
writing pygraphviz.egg-info/PKG-INFO
writing dependency_links to pygraphviz.egg-info/dependency_links.txt
writing top-level names to pygraphviz.egg-info/top_level.txt
reading manifest file 'pygraphviz.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
adding license file 'LICENSE'
writing manifest file 'pygraphviz.egg-info/SOURCES.txt'
copying pygraphviz/graphviz.i -> build/lib.linux-x86_64-cpython-39/pygraphviz
copying pygraphviz/graphviz_wrap.c -> build/lib.linux-x86_64-cpython-39/pygraphviz
running build_ext
building 'pygraphviz._graphviz' extension
creating build/temp.linux-x86_64-cpython-39
creating build/temp.linux-x86_64-cpython-39/pygraphviz
gcc -pthread -B /home/line/anaconda3/envs/DURel/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -I/home/line/anaconda3/envs/DURel/include -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -fPIC -DSWIG_PYTHON_STRICT_BYTE_CHAR -I/home/line/anaconda3/envs/DURel/include/python3.9 -c pygraphviz/graphviz_wrap.c -o build/temp.linux-x86_64-cpython-39/pygraphviz/graphviz_wrap.o
pygraphviz/graphviz_wrap.c:2711:10: fatal error: graphviz/cgraph.h: No such file or directory
2711 | #include "graphviz/cgraph.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> pygraphviz
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.```
Instead of providing a requirements.txt, we should provide installation instructions for Python Anaconda because this will allow to use the graph-tool library.
reorganize system2 pipeline to have separate, modular data aggregation step
Depending on the complexity of the WUG, and on the filtering task, filtering can be slow. I tried to mitigate this, for example by using a map for annotations when calculating the sub-graph for the annotators filter, which made the filter a little faster than the earlier implementation.
If filtering is still too slow to be usable in some cases, there could be more time-efficient ways to access data, e.g. by going around the implementation of vis.js (possibly collecting all nodes and edges into collections when loading the page and having your own iterators for that data, or using a library that iterates over datasets faster, for example a C-based library).
Currently, annotator judgments on judgment plots are cast to integers. Should be adapted to floats for non-ordinal scales.
Please check folder 'neat' . For neat 5th row (identifier: neat-105), indices_target_sentence_tokenized are reflected for only half of the target sentence. Also in the neat 9th row, there are missing values for indices_target_token/sentence. Most of the other instances return correct indices in manual checks.
There is no need to upload instances with source data. Instead, instances should be created in the process of label aggregation from source judgments see here:
https://github.com/Garrafao/durel_system_annotators/blob/master/tests/data.py
It is my understanding that the exclude_nodes.py script removes from the graph the nodes which either have more than 50% of 0 judgments or which do not have valid judgments at all.
The nodes here are usages (sentences containing target words).
Based on this understanding, for each target word I would expect the number of excluded nodes plus the number of preserved nodes to be equal to the number of the original nodes (number of usages). However, this is not the case. With my data, after the processing is over, I am looking at the files in the stats
subdirectory and observe cases like this:
Excluded nodes (from excluded_nodes.csv
): 15
Preserved nodes (from stats_grouping.csv
): 10
However, there were 22 unique sentences for this word in the original data. I see in the graphs and in the clusters, that there are indeed 10 sentences taken into account, so 12 sentences were discarded, not 15. I observe this for more than one target word. In extreme cases, the number of excluded nodes actually is higher than the total number of sentences for a word.
I did not yet look deep into the code, but may be I am just misunderstanding something?
The latest pip version of mlrose uses sklearn instead of scikit-learn, which is now deprecated: https://github.com/scikit-learn/sklearn-pypi-package
However, the developers have modified their github repo (if not the pip package): https://github.com/gkhayes/mlrose
This means that to fix the issue, you can change the requirements.txt and replace mlrose==1.3.0
with git+https://github.com/gkhayes/mlrose
(git has to be installed in the environment for this to work).
Another option is to check if there is a newer version of mlrose packaged by someone else. I didn't do this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.