Code Monkey home page Code Monkey logo

wugs's People

Contributors

akutuzov avatar choppa98 avatar garrafao avatar lukasmi avatar shafqatvirk avatar tuo-zhang avatar winobes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wugs's Issues

some data created by graph2plot2.py not readable

Now, the Stats, Grouping stats, Agreement stats works on the server and locally. But Plotting stats and Annotator data does not work locally, also, the info and annotator filter do not work.
To reproduce, please run:
scripts/run_system2.sh test_uug/ correlation spring

additional quotes introduced in one_for_all notebook

check these two lines in the notebook:

judgments_wug.to_csv('judgements_all.csv', index=False)
uses_wug.to_csv('uses_all.csv', index=False)

you should fix this because they will introduce additional quotation mark in the output

WUG Pipeline Sub-processes

What are the separate sub-processes of the pipeline? Which values are calculated in these sub-processes? What is the format of the output values?

Input Options WUG Pipeline

Which parameters can be passed to the pipeline to filter / specify the process? In which format should the parameters be presented? For which sub-processes of the pipeline are the parameters needed?

renaming the data_joint.json files and alike to js suffix

the data contained inside these files are not Json but javascript code. Here you are creating a global variable with standard javascript grammar, also, in the code example like:

<script type="text/javascript" src="../stats.json"> according the type, the src should be text or javascript. Please rename all the files end with json suffix to js suffix.

test_uug dataset not usable

Please try to upload both the uses file and the instances file into the system and make sure they can be uploaded. Currently, both two files have problems.

Flexible non-judgment value

Currently, the pipeline interprets judgments of 0.0 as non-valid. For label ranges including 0 as valid label, such as cosine similarity ranging from -1 to 1, this leads to wrongly treating 0.0 as non-valid label. Thus, we need to make the non-valid label a parameter.

Timestamps for WUG pipeline

It would be helpful to have timestamps for the errors that occur in the WUG pipeline to improve the traceability of these errors when the pipeline is executed multiple times.

Running the pipeline with Python 3.11

When running the pipeline in an environment with python 3.11 the following error occurs:

Traceback (most recent call last):
  File "/home/arbeit/Desktop/DURel/durel_system/WUGs/scripts/data2join.py", line 46, in <module>
    w = csv.DictWriter(f, ['identifier1', 'identifier2', 'judgment', 'comment', 'annotator', 'lemma'], delimiter='\t', quoting = csv.QUOTE_NONE, quotechar='')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/csv.py", line 139, in __init__
    self.writer = writer(f, dialect, *args, **kwds)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: "quotechar" must be a 1-character string

This error (and comparable ones) occur in three scripts: data2join, data2agr and data2annotator. I was able to 'solve' it by removing quotechar=0 in all three files. It should be tested whether this change breaks anything in the pipeline.

Installing pygraphviz with pip in conda (and elsewhere)?

conda version 22.11.1
python version 3.9.13.final.0

When pip installing the requirements.txt from inside a conda environment, an error can occur with pygraphviz. Installing pygraphviz directly though conda works conda install --channel conda-forge pygraphviz. The error could occur because pygraphviz does not find graphviz when pip installed inside conda.

Collecting pygraphviz
  Using cached pygraphviz-1.10.zip (120 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: pygraphviz
  Building wheel for pygraphviz (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [55 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-39
      creating build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/agraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/testing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/graphviz.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      creating build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_unicode.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_attribute_defaults.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_string.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_node_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_subgraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_html.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_repr_mimebundle.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_clear.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_drawing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_edge_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_graph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_layout.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_close.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_readwrite.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      running egg_info
      writing pygraphviz.egg-info/PKG-INFO
      writing dependency_links to pygraphviz.egg-info/dependency_links.txt
      writing top-level names to pygraphviz.egg-info/top_level.txt
      reading manifest file 'pygraphviz.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching '*.png' under directory 'doc'
      warning: no files found matching '*.txt' under directory 'doc'
      warning: no files found matching '*.css' under directory 'doc'
      warning: no previously-included files matching '*~' found anywhere in distribution
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '.svn' found anywhere in distribution
      no previously-included directories found matching 'doc/build'
      adding license file 'LICENSE'
      writing manifest file 'pygraphviz.egg-info/SOURCES.txt'
      copying pygraphviz/graphviz.i -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/graphviz_wrap.c -> build/lib.linux-x86_64-cpython-39/pygraphviz
      running build_ext
      building 'pygraphviz._graphviz' extension
      creating build/temp.linux-x86_64-cpython-39
      creating build/temp.linux-x86_64-cpython-39/pygraphviz
      gcc -pthread -B /home/line/anaconda3/envs/DURel/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -I/home/line/anaconda3/envs/DURel/include -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -fPIC -DSWIG_PYTHON_STRICT_BYTE_CHAR -I/home/line/anaconda3/envs/DURel/include/python3.9 -c pygraphviz/graphviz_wrap.c -o build/temp.linux-x86_64-cpython-39/pygraphviz/graphviz_wrap.o
      pygraphviz/graphviz_wrap.c:2711:10: fatal error: graphviz/cgraph.h: No such file or directory
       2711 | #include "graphviz/cgraph.h"
            |          ^~~~~~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pygraphviz
  Running setup.py clean for pygraphviz
Failed to build pygraphviz
Installing collected packages: pygraphviz
  Running setup.py install for pygraphviz ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for pygraphviz did not run successfully.
  │ exit code: 1
  ╰─> [57 lines of output]
      running install
      /home/line/anaconda3/envs/DURel/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-39
      creating build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/agraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/testing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/graphviz.py -> build/lib.linux-x86_64-cpython-39/pygraphviz
      creating build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_unicode.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_attribute_defaults.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_string.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_node_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_subgraph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/__init__.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_html.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_scraper.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_repr_mimebundle.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_clear.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_drawing.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_edge_attributes.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_graph.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_layout.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_close.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      copying pygraphviz/tests/test_readwrite.py -> build/lib.linux-x86_64-cpython-39/pygraphviz/tests
      running egg_info
      writing pygraphviz.egg-info/PKG-INFO
      writing dependency_links to pygraphviz.egg-info/dependency_links.txt
      writing top-level names to pygraphviz.egg-info/top_level.txt
      reading manifest file 'pygraphviz.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching '*.png' under directory 'doc'
      warning: no files found matching '*.txt' under directory 'doc'
      warning: no files found matching '*.css' under directory 'doc'
      warning: no previously-included files matching '*~' found anywhere in distribution
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '.svn' found anywhere in distribution
      no previously-included directories found matching 'doc/build'
      adding license file 'LICENSE'
      writing manifest file 'pygraphviz.egg-info/SOURCES.txt'
      copying pygraphviz/graphviz.i -> build/lib.linux-x86_64-cpython-39/pygraphviz
      copying pygraphviz/graphviz_wrap.c -> build/lib.linux-x86_64-cpython-39/pygraphviz
      running build_ext
      building 'pygraphviz._graphviz' extension
      creating build/temp.linux-x86_64-cpython-39
      creating build/temp.linux-x86_64-cpython-39/pygraphviz
      gcc -pthread -B /home/line/anaconda3/envs/DURel/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -I/home/line/anaconda3/envs/DURel/include -fPIC -O2 -isystem /home/line/anaconda3/envs/DURel/include -fPIC -DSWIG_PYTHON_STRICT_BYTE_CHAR -I/home/line/anaconda3/envs/DURel/include/python3.9 -c pygraphviz/graphviz_wrap.c -o build/temp.linux-x86_64-cpython-39/pygraphviz/graphviz_wrap.o
      pygraphviz/graphviz_wrap.c:2711:10: fatal error: graphviz/cgraph.h: No such file or directory
       2711 | #include "graphviz/cgraph.h"
            |          ^~~~~~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> pygraphviz

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.```

switch to Anaconda for installation

Instead of providing a requirements.txt, we should provide installation instructions for Python Anaconda because this will allow to use the graph-tool library.

aggregation step

reorganize system2 pipeline to have separate, modular data aggregation step

Slow filtering / efficiency problem in JavaScript of WUG filter template

Depending on the complexity of the WUG, and on the filtering task, filtering can be slow. I tried to mitigate this, for example by using a map for annotations when calculating the sub-graph for the annotators filter, which made the filter a little faster than the earlier implementation.

If filtering is still too slow to be usable in some cases, there could be more time-efficient ways to access data, e.g. by going around the implementation of vis.js (possibly collecting all nodes and edges into collections when loading the page and having your own iterators for that data, or using a library that iterates over datasets faster, for example a C-based library).

integers on judgment plots

Currently, annotator judgments on judgment plots are cast to integers. Should be adapted to floats for non-ordinal scales.

Usim

Please check folder 'neat' . For neat 5th row (identifier: neat-105), indices_target_sentence_tokenized are reflected for only half of the target sentence. Also in the neat 9th row, there are missing values for indices_target_token/sentence. Most of the other instances return correct indices in manual checks.

The number of excluded nodes

It is my understanding that the exclude_nodes.py script removes from the graph the nodes which either have more than 50% of 0 judgments or which do not have valid judgments at all.
The nodes here are usages (sentences containing target words).

Based on this understanding, for each target word I would expect the number of excluded nodes plus the number of preserved nodes to be equal to the number of the original nodes (number of usages). However, this is not the case. With my data, after the processing is over, I am looking at the files in the stats subdirectory and observe cases like this:

Excluded nodes (from excluded_nodes.csv): 15
Preserved nodes (from stats_grouping.csv): 10

However, there were 22 unique sentences for this word in the original data. I see in the graphs and in the clusters, that there are indeed 10 sentences taken into account, so 12 sentences were discarded, not 15. I observe this for more than one target word. In extreme cases, the number of excluded nodes actually is higher than the total number of sentences for a word.

I did not yet look deep into the code, but may be I am just misunderstanding something?

mlrose requirement sklearn

The latest pip version of mlrose uses sklearn instead of scikit-learn, which is now deprecated: https://github.com/scikit-learn/sklearn-pypi-package

However, the developers have modified their github repo (if not the pip package): https://github.com/gkhayes/mlrose

This means that to fix the issue, you can change the requirements.txt and replace mlrose==1.3.0 with git+https://github.com/gkhayes/mlrose (git has to be installed in the environment for this to work).

Another option is to check if there is a newer version of mlrose packaged by someone else. I didn't do this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.