facebookresearch / vizseq Goto Github PK

View Code? Open in Web Editor NEW

438.0 438.0 61.0 19.8 MB

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Home Page: https://arxiv.org/abs/1909.05424

License: MIT License

Shell 0.22% Python 68.99% HTML 26.23% JavaScript 3.83% CSS 0.73%

vizseq's People

Contributors

Stargazers

Watchers

Forkers

patil2099 amirstudy shafiahmed trendingtechnology fuzihaofzh liguiming77 tonydeep codeaudit jagsapphub shonenkov satyajitdas01081991 intuitionmachine milan-chicago slbinilkumar phymucs roger-g databill86 megayeye positivedefinite dragomirradev theshadow29 shumingma dwtcourses spartag117 neural-mt 6un9-h0-dan theniteshsingh danishack yucoian terrorizer1980 xiaming9880 monkidea tannonk hieunguyen1053 apllolulu debashishc jerryzhou3 rootofmylife manikant92 getcomputerscience scumechanics junnyu may- techthiyanes xiaoouwang sohaibyaser ajviljoen creativity-spot old-young233 nwut libinliu0189 mobasherah12 iq-scm ethicalsecurity-agency cybernetix-s3c teutades paperwave habibzadeh

vizseq's Issues

[Bug] Getting Import error (Tokenizer13a) when following instructions

🐛 Bug

Hi there,

I've installed vizseq as described in the README and then tried to import Tokenizer13a. Unfortunately I got
ImportError: cannot import name 'Tokenizer13a' error.

git clone https://github.com/facebookresearch/vizseq.git
cd vizseq
pip install --editable ./

> Successfully installed sacrebleu-2.0.0 vizseq-0.1.15

To reproduce

git clone https://github.com/facebookresearch/vizseq.git
cd vizseq
pip install --editable ./
python 
>>> from sacrebleu.tokenizers import Tokenizer13a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Tokenizer13a'

** Minimal Code/Config snippet to reproduce **

** Stack trace/error message **

>>> from sacrebleu.tokenizers import Tokenizer13a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Tokenizer13a'

Expected Behavior

System information

VizSeq Version 0.1.15
Python version 3.6
Amazon Linux AMI (fedora)

Additional context

Add any other context about the problem here.

Question about `tag` and `group` in official example

In the official scorer example from https://facebookresearch.github.io/vizseq/docs/getting_started/scorer_example/, the second block confuse me.

Corpus-level BLEU: 67.945
Sentence-level BLEU: [75.984, 61.479]
Group BLEU: {'Test Group 2': 75.984, 'Test Group 1': 75.984}

I can see two generated sentences with corresponding reference sentences in the first block.

ref = [['This is a sample #1 reference.', 'This is a sample #2 reference.']]
hypo = ['This is a sample #1 prediction.', 'This is a sample #2 model prediction.']
tags = [['Test Group 1', 'Test Group 2']]
scores = scorer.score(hypo, ref, tags=tags)
print(f'Corpus-level BLEU: {scores.corpus_score}')
print(f'Sentence-level BLEU: {scores.sent_scores}')
print(f'Group BLEU: {scores.group_scores}')

The first sample belongs to Test Group 1 and the second sample belongs to Test Group 2. If I'm not misunderstanding the use of the tag, according to the Sentence-level BLEU,the Group BLEU should be {'Test Group 2': 61.479, 'Test Group 1': 75.984}.

But the execution result is Group BLEU: {'Test Group 2': 75.984, 'Test Group 1': 75.984}

View Scores NoneType not subscriptable

When running the following for scores like Rouge and others like:

vizseq.view_scores(ref, hypo, ['metric that's not bleu'], tags=tag)

I am getting:

~/.local/lib/python3.6/site-packages/vizseq/scorers/__init__.py in _score_multiprocess_averaged(self, hypothesis, references, tags, sent_score_func)
    170             for t in tag_set:
    171                 indices = [i for i, cur in enumerate(tags) if t in cur]
--> 172                 group_scores[t] = np.mean([sent_scores[i] for i in indices])
    173 
    174         return VizSeqScore.make(

~/.local/lib/python3.6/site-packages/vizseq/scorers/__init__.py in <listcomp>(.0)
    170             for t in tag_set:
    171                 indices = [i for i, cur in enumerate(tags) if t in cur]
--> 172                 group_scores[t] = np.mean([sent_scores[i] for i in indices])
    173 
    174         return VizSeqScore.make(

TypeError: 'NoneType' object is not subscriptable

Any idea why that's going on? Using text data this is at least 3 tokens or more.

This works fine when running view_examples.

[Feature Request] Update BertScorer with oop implementation

🚀 Feature Request

Bert Score (https://github.com/Tiiiger/bert_score) new version (0.3.1) supports oop implementation. Current vizseq uses the functional implementation which could be updated to oop implementation.

Motivation

Currently, using Bert Score in a validation loop causes re-loading the model again and again. This can be avoided with oop implementation.

Pitch

I see two solutions: (i) create a separate scorer named bert_score_oop (ii) in the current implementation of bert_score add argument whether to use oop implementation or not.

Are you willing to open a pull request?
Yes, I can send a pull request

[Bug] BLEUScorer uses wrong default tokenizer.

🐛 Bug

vizseq.scorers.bleu.BLEUScorer does not use Tokenizer13a by default. When I look at the code, it looks like it should be used by default. sacrebleu library uses the Tokenizer13a by default as well.

To reproduce

Minimal Code/Config snippet to reproduce

import vizseq

scorer = vizseq.scorers.bleu.BLEUScorer()
print(scorer.score(["This is really nice."], [["That's really nice."]]))
# corpus_score = 31.947

scorer = vizseq.scorers.bleu.BLEUScorer(extra_args={'tokenizer': '13a'})
print(scorer.score(["This is really nice."], [["That's really nice."]]))
# corpus_score = 39.764

Stack trace/error message

The problem is here. Variable tokenizer is set to string none. When calling method get_default_args (here), default value 13a for parameter tokenize is not used, because the string none is passed.

Expected Behavior

vizseq.scorers.bleu.BLEUScorer should use Tokenizer13a by default.

System information

vizseq==0.1.15
python==3.7.3
macOS

🐛 AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

🐛 Bug

When trying to run the webapp with the example data, I have this error :

AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

To reproduce

Follow README instructions : download example data and run :

python -m vizseq.server --port 9001 --data-root ./examples/data

Stack trace/error message

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/me/workspace/vizseq/vizseq/server.py", line 14, in <module>
    logger.set_console_mode(enable=True)
AttributeError: 'VizSeqLogger' object has no attribute 'set_console_mode'

Expected Behavior

The code run normally.

System information

VizSeq Version : 0.1.2
Python version : 3.6.8
Operating system : Ubuntu 16.04

pip3 install vizseq failed on AArch64, Fedora 33

[jw@cn05 ~]$ pip3 install vizseq
Defaulting to user installation because normal site-packages is not writeable
Collecting vizseq
Using cached vizseq-0.1.15-py3-none-any.whl (81 kB)
Collecting nltk>=3.5
Using cached nltk-3.5-py3-none-any.whl
Collecting sacrebleu>=1.4.13
Using cached sacrebleu-1.5.0-py3-none-any.whl (65 kB)
Collecting langid
Using cached langid-1.1.6.tar.gz (1.9 MB)
Requirement already satisfied: tqdm in ./.local/lib/python3.9/site-packages (from vizseq) (4.31.1)
Collecting google-cloud-translate
Using cached google_cloud_translate-3.0.2-py2.py3-none-any.whl (93 kB)
Collecting torch
Using cached torch-0.1.2.post2.tar.gz (128 kB)
Requirement already satisfied: numpy in ./.local/lib/python3.9/site-packages (from vizseq) (1.19.5)
Requirement already satisfied: jinja2 in ./.local/lib/python3.9/site-packages (from vizseq) (2.10.3)
Collecting soundfile
Using cached SoundFile-0.10.3.post1-py2.py3-none-any.whl (21 kB)
Requirement already satisfied: py-rouge in ./.local/lib/python3.9/site-packages (from vizseq) (1.1)
Requirement already satisfied: matplotlib in ./.local/lib/python3.9/site-packages (from vizseq) (3.3.2)
Requirement already satisfied: tornado in ./.local/lib/python3.9/site-packages (from vizseq) (6.1)
Requirement already satisfied: IPython in ./.local/lib/python3.9/site-packages (from vizseq) (7.18.1)
Collecting bert-score
Using cached bert_score-0.3.7-py3-none-any.whl (53 kB)
Requirement already satisfied: pandas in ./.local/lib/python3.9/site-packages (from vizseq) (1.1.4)
Collecting laserembeddings
Using cached laserembeddings-1.1.1-py3-none-any.whl (13 kB)
Requirement already satisfied: click in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (7.1.2)
Requirement already satisfied: regex in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (2020.11.13)
Requirement already satisfied: joblib in ./.local/lib/python3.9/site-packages (from nltk>=3.5->vizseq) (0.17.0)
Collecting portalocker
Using cached portalocker-2.2.1-py2.py3-none-any.whl (15 kB)
Collecting transformers>=3.0.0
Using cached transformers-4.3.3-py3-none-any.whl (1.9 MB)
Collecting bert-score
Using cached bert_score-0.3.6-py3-none-any.whl (53 kB)
Using cached bert_score-0.3.5-py3-none-any.whl (52 kB)
Using cached bert_score-0.3.4-py3-none-any.whl (52 kB)
Using cached bert_score-0.3.3-py3-none-any.whl (52 kB)
Using cached bert_score-0.3.2-py3-none-any.whl (52 kB)
Using cached bert_score-0.3.1-py3-none-any.whl (51 kB)
Using cached bert_score-0.3.0-py3-none-any.whl (48 kB)
Using cached bert_score-0.2.3-py3-none-any.whl (15 kB)
Using cached bert_score-0.2.2-py3-none-any.whl (14 kB)
Using cached bert_score-0.1.2-py3-none-any.whl (9.4 kB)
Using cached bert_score-0.1.1-py3-none-any.whl (9.4 kB)
Using cached bert_score-0.1.0-py3-none-any.whl (7.3 kB)
INFO: pip is looking at multiple versions of to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of sacrebleu to determine which version is compatible with other requirements. This could take a while.
Collecting sacrebleu>=1.4.13
Using cached sacrebleu-1.4.14-py3-none-any.whl (64 kB)
Using cached sacrebleu-1.4.13-py3-none-any.whl (43 kB)
INFO: pip is looking at multiple versions of nltk to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of vizseq to determine which version is compatible with other requirements. This could take a while.
Collecting vizseq
Using cached vizseq-0.1.14-py3-none-any.whl (81 kB)
Using cached vizseq-0.1.13-py3-none-any.whl (81 kB)
Using cached vizseq-0.1.12-py3-none-any.whl (81 kB)
Using cached vizseq-0.1.11-py3-none-any.whl (81 kB)
Using cached vizseq-0.1.10-py3-none-any.whl (80 kB)
Using cached vizseq-0.1.9-py3-none-any.whl (78 kB)
Requirement already satisfied: nltk in ./.local/lib/python3.9/site-packages (from vizseq) (3.4.5)
Collecting sacrebleu==1.4.7
Using cached sacrebleu-1.4.7-py3-none-any.whl (59 kB)
Requirement already satisfied: typing in ./.local/lib/python3.9/site-packages (from sacrebleu==1.4.7->vizseq) (3.7.4.3)
Collecting mecab-python3
Using cached mecab-python3-1.0.3.tar.gz (77 kB)
INFO: pip is looking at multiple versions of to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of sacrebleu to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install vizseq and vizseq==0.1.9 because these package versions have conflicting dependencies.

The conflict is caused by:
vizseq 0.1.9 depends on torch
bert-score 0.3.7 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.6 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.5 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.4 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.3 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.2 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.1 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.3.0 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.2.3 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.2.2 depends on torch>=1.0.0
vizseq 0.1.9 depends on torch
bert-score 0.1.2 depends on torch>=0.4.1
vizseq 0.1.9 depends on torch
bert-score 0.1.1 depends on torch>=0.4.1
vizseq 0.1.9 depends on torch
bert-score 0.1.0 depends on torch>=0.4.1

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
[jw@cn05 ~]$

[Question] Calculated BLEU score

Hi :)

The tool returns a BLEU score for the machine translation and runs great in general, but I am not sure if the BLEU score represents the sentence level or corpus level? I haven't been able to gather anything conclusive from the sacreBLEU implementation, so I am hoping you can help me with this :)

Best regards,
Tobias

Plans for windows 10 support

This repo is looking really cool and I would like to use the features it offers for my thesis :)

What are the plans/timeline for implementing win 10 support?

Best regards,
Tobias

[Bug] Multiple references

When having multiple references (attached muti_refs.zip), I cannot configure metric BLEU (but I can configure GLEU). I get a 500 internal server error.

This does not happen with a single reference (see attached single_ref.zip).

[Bug] Vizseq CSS breaks Jupyter layout

🐛 Bug

Executing vizseq.view_stats breaks the layout of the Jupyter. The menu at the top obscures a majority of the screen and a blank area of ~60px appears at the top of the page.

To reproduce

** Minimal Code/Config snippet to reproduce **

start Jupyter jupyter notebook
view any of the example notebooks e.g. speech_translation
Execute the cells one-by-one.

When the first cell containing vizseq.view_stats is finishes the layout changes and appears broken.

Expected Behavior

The display of tables and graphs by vizseq does not affect the layout of the Jupyter notebook.

System information

VizSeq Version: '0.1.11' (clone from master yesterday)
Python version: Python 3.8.1 (default, Jan 8 2020, 23:09:20) [GCC 9.2.0] on linux
Operating system: Manjaro Linux

Additional context

Cause: The bootstrap.min.css and an inline stylesheet loaded by vizseq break the layout. The inline stylesheet is:

body {
   padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
}

The inline stylesheet is responsible for the blank bar at the top while Bootstrap breaks the menu's formatting.

To test this disable both stylesheets in the stylesheet editor included in the developer tools of a browser.

[Bug] - cannot import name 'tokenize_13a' from 'sacrebleu'

🐛 Bug

I just followed the installation steps and got this error.

To reproduce

** Minimal Code/Config snippet to reproduce **

** Stack trace/error message **

(base) diegomoussallem@Diegos-MBP examples % python -m vizseq.server --port 9001 --data-root examples/data
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/anaconda3/lib/python3.7/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/__init__.py", line 15, in <module>
    from vizseq.ipynb import *
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/ipynb/__init__.py", line 8, in <module>
    from .core import (view_examples, view_n_grams, view_stats, view_scores,
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/ipynb/core.py", line 15, in <module>
    from vizseq._data import (VizSeqDataSources, PathOrPathsOrDictOfStrList,
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/__init__.py", line 14, in <module>
    from .config_manager import VizSeqTaskConfigManager, VizSeqGlobalConfigManager
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/config_manager.py", line 13, in <module>
    from .tokenizers import VizSeqTokenization
  File "/Users/diegomoussallem/Desktop/vizseq/vizseq/_data/tokenizers.py", line 10, in <module>
    from sacrebleu import tokenize_13a, tokenize_v14_international, tokenize_zh
ImportError: cannot import name 'tokenize_13a' from 'sacrebleu' (/opt/anaconda3/lib/python3.7/site-packages/sacrebleu/__init__.py)

Expected Behavior

System information

Additional context

Add any other context about the problem here.

🐛 TypeError: score() got an unexpected keyword argument 'bert'

🐛 Bug

I tried to apply BertScore on my data, but received this error :

TypeError: score() got an unexpected keyword argument 'bert'

To reproduce

In configuration, select BertScore as metric.
Refresh the page

Stack trace/error message

Traceback (most recent call last):
  File "/home/me/.venv/presum/lib/python3.6/site-packages/tornado/web.py", line 1590, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "/home/me/workspace/vizseq/vizseq/server.py", line 103, in get
    pd = wv.get_page_data()
  File "/home/me/workspace/vizseq/vizseq/_view/web_view.py", line 158, in get_page_data
    sorting_metric=self.sorting_metric, need_lang_tags=True
  File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 132, in get
    for s in metrics
  File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 132, in <dictcomp>
    for s in metrics
  File "/home/me/workspace/vizseq/vizseq/_view/data_view.py", line 130, in <dictcomp>
    for m, hh in cur_hypo.items()
  File "/home/me/workspace/vizseq/vizseq/scorers/bert_score.py", line 28, in score
    no_idf=True, verbose=self.verbose
TypeError: score() got an unexpected keyword argument 'bert'

Expected Behavior

Able to see BertScore.

System information

VizSeq Version : 0.1.2
Python version : 3.6.8
Operating system : Ubuntu 16.04

🐛 Uncaught exception GET

🐛 Bug

When trying to run the webapp with the example data, I have this error :

Uncaught exception GET

To reproduce

Follow README instructions : download example data and run :

python -m vizseq.server --port 9001 --data-root ./examples/data

The server starts fine, but when accessing the webapp at localhost:9001, I can only see 500: Internal Server Error.

Stack trace/error message

INFO - 11/04/19 10:36:39 - 0:00:00 - Application Started
You can navigate to http://localhost:9001
ERROR - 11/04/19 10:36:42 - 0:00:03 - Uncaught exception GET / (192.168.0.30)
                                      HTTPServerRequest(protocol='http', host='192.168.0.231:9001', method='GET', uri='/', version='HTTP/1.1', remote_ip='192.168.0.30')
ERROR - 11/04/19 10:36:42 - 0:00:03 - 500 GET / (192.168.0.30) 1.18ms

Expected Behavior

The webapp run normally.

System information

VizSeq Version : 0.1.2
Python version : 3.6.8
Operating system : Ubuntu 16.04

Example speech task (IWSLT17 dev) not pairing correct audio source with reference [Bug]

🐛 Bug

Audio segments from speech data in example speech translation task (IWSLT17 dev) are not correctly associated with reference data.

Only the first TED talk audio segments are correctly aligned to the reference... playing the audio segments related to any other talks (from # 3 / 10 / 887 ( 153 / 887 ) onwards on page 16 of the task using the defaults) results in the segments of the first TED talk audio being played rather than the segments specified in the task directory speech_translation_iwslt17_dev/src_0.zip/source.txt

To reproduce

Get the example speech task data (IWSLT7 dev)

$ bash get_example_data.sh speech_translation_iwslt17_dev

Start the server and navigate to :
http://127.0.0.1:5000/view?t=speech_translation_iwslt17_dev&m=&q=&p_sz=10&p_no=16&s=0&s_metric=
Play the audio segments: first two on this page will be correctly associated with reference text, from # 3 / 10 / 887 ( 153 / 887 ) onwards they are not.

facebookresearch / vizseq Goto Github PK

vizseq's People

Contributors

Stargazers

Watchers

Forkers

vizseq's Issues

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

🚀 Feature Request

Motivation

Pitch

🐛 Bug

To reproduce

Expected Behavior

System information

🐛 Bug

To reproduce

Expected Behavior

System information

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

🐛 Bug

To reproduce

Expected Behavior

System information

🐛 Bug

To reproduce

Expected Behavior

System information

🐛 Bug

To reproduce

Recommend Projects

Recommend Topics

Recommend Org