Code Monkey home page Code Monkey logo

fast-bleu's Introduction

fast-bleu Package

This is a fast multithreaded C++ implementation of NLTK BLEU with Python wrapper; computing BLEU and SelfBLEU scores for a fixed reference set. It can return (Self)BLEU for different (max) n-grams simultaneously and efficiently (e.g. BLEU-2, BLEU-3, etc.).

Installation

The installation requires c++11. The requirements.txt file is the required python packages to run the test_cases.py file.

Linux and WSL

Installing PyPI latest stable release:

pip install --user fast-bleu

MacOS

As the macOS uses clang and it does not support OpenMP, one workaround is to first install gcc with brew install gcc. After that, gcc specific binaries will be added (for example, it will be maybe gcc-10 and g++-10).

To change the default compiler, an option to the installation command is added, so you can install the PyPI latest stable release with the following command:

pip install --user fast-bleu --install-option="--CC=<path-to-gcc>" --install-option="--CXX=<path-to-g++>"

Windows

Not tested yet!

Sample Usage

Here is an example to compute BLEU-2, BLEU-3, SelfBLEU-2 and SelfBLEU-3:

>>> from fast_bleu import BLEU, SelfBLEU
>>> ref1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']

>>> list_of_references = [ref1, ref2, ref3]
>>> hypotheses = [hyp1, hyp2]
>>> weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}

>>> bleu = BLEU(list_of_references, weights)
>>> bleu.get_score(hypotheses)
{'bigram': [0.7453559924999299, 0.0191380231127159], 'trigram': [0.6240726901657495, 0.013720869575946234]}

which means:

  • BLEU-2 for hyp1 is 0.7453559924999299

  • BLEU-2 for hyp2 is 0.0191380231127159

  • BLEU-3 for hyp1 is 0.6240726901657495

  • BLEU-3 for hyp2 is 0.013720869575946234

>>> self_bleu = SelfBLEU(list_of_references, weights)
>>> self_bleu.get_score()
{'bigram': [0.25819888974716115, 0.3615507630310936, 0.37080992435478316],
        'trigram': [0.07808966062765045, 0.20140620205719248, 0.21415334758254043]}

which means:

  • SelfBLEU-2 for ref1 is 0.25819888974716115

  • SelfBLEU-2 for ref2 is 0.3615507630310936

  • SelfBLEU-2 for ref3 is 0.37080992435478316

  • SelfBLEU-3 for ref1 is 0.07808966062765045

  • SelfBLEU-3 for ref2 is 0.20140620205719248

  • SelfBLEU-3 for ref3 is 0.21415334758254043

Caution Each token of reference set is converted to string format during computation.

For further details, refer to the documentation provided in the source codes.

Citation

Please cite our paper if it helps with your research.

@inproceedings{alihosseini-etal-2019-jointly,
    title = {Jointly Measuring Diversity and Quality in Text Generation Models},
    author = {Alihosseini, Danial  and
      Montahaei, Ehsan  and
      Soleymani Baghshah, Mahdieh},
    booktitle = {Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation},
    month = {jun},
    year = {2019},
    address = {Minneapolis, Minnesota},
    publisher = {Association for Computational Linguistics},
    url = {https://www.aclweb.org/anthology/W19-2311},
    doi = {10.18653/v1/W19-2311},
    pages = {90--98},
}

fast-bleu's People

Contributors

danial-alh avatar dependabot[bot] avatar iams4n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fast-bleu's Issues

memory leak

There is a serious memory leak, just because the del function of BLEU and SelfBLEU determine the existence of attributes and functions by wrong name.
the original code is:
if hasattr(self, '__instance') and hasattr(self, '__del_instance ') (here is one more space )
and it shoud be:
if hasattr(self, '_BLEU__instance') and hasattr(self, '_BLEU__del_instance')
or
if hasattr(self, '_SelfBLEU__instance') and hasattr(self, '_SelfBLEU__del_instance')

RuntimeError: ffi_prep_cif_var failed

When I run the test_case.py on Linux, I got this error.

tokenized!
Traceback (most recent call last):
File "test_cases.py", line 96, in
compare(nltk_org_bleu, cpp_bleu)
File "test_cases.py", line 65, in compare
cpp_result, cpp_time = get_execution_time(cpp_func)
File "test_cases.py", line 59, in get_execution_time
result = np.array(func(ref_tokens, test_tokens))
File "test_cases.py", line 31, in cpp_bleu
bleu = BLEU(refs, w, verbose=True)
File "/wutong/fast-bleu/fast_bleu/python_wrapper.py", line 85, in init
self.__instance = self.__get_instance(
RuntimeError: ffi_prep_cif_var failed

Could you tell me how to solve it? Thanks!

Fail to compile on MacOS

I tried to build the package using either gcc or g++, same error:

    for (pair<string, int> const &p : counts)
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
fast_bleu/cpp_sources/sources/nltk.cpp:102:28: error: variable-sized object may
      not be initialized
    long long p_numerators[max_n] = {0};   // Key = ngram order, and val...
                           ^~~~~
fast_bleu/cpp_sources/sources/nltk.cpp:103:30: error: variable-sized object may
      not be initialized
    long long p_denominators[max_n] = {0}; // Key = ngram order, and val...
                             ^~~~~

2 warnings and 2 errors generated.

Bugs: Fail to calulate BLEU when reference corpus is large

This is the code to cause problems:

from fast_bleu import BLEU

bleu = BLEU(reference_corpus, weights)

when reference_corpus is a list containing 70000 sentences with a length of 200.

And the error message is shown as follows:

Fatal Python error: Segmentation fault

Thread 0x00007fa775d6a700 (most recent call first):
  File "/home/tangtianyi/anaconda3/lib/python3.7/threading.py", line 300 in wait
  File "/home/tangtianyi/anaconda3/lib/python3.7/threading.py", line 552 in wait
  File "/home/tangtianyi/anaconda3/lib/python3.7/site-packages/tqdm/_monitor.py", line 59 in run
  File "/home/tangtianyi/anaconda3/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/home/tangtianyi/anaconda3/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007fa809c26700 (most recent call first):
  File "/home/tangtianyi/anaconda3/lib/python3.7/site-packages/fast_bleu/__python_wrapper.py", line 86 in __init__

Thank you for taking the time to solve the issues.

OSError: symbol not found in flat namespace '_GOMP_loop_nonmonotonic_guided_next'

Hi Danial,

I followed your instructions for setup on MacOS Monterey but ran into the following error while running the fast-bleu example code:

from fast_bleu import BLEU, SelfBLEU
ref1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
         'ensures', 'that', 'the', 'military', 'will', 'forever',
         'heed', 'Party', 'commands']
ref2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
         'guarantees', 'the', 'military', 'forces', 'always',
         'being', 'under', 'the', 'command', 'of', 'the', 'Party']
ref3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
         'army', 'always', 'to', 'heed', 'the', 'directions',
         'of', 'the', 'party']

hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
        'ensures', 'that', 'the', 'military', 'always',
        'obeys', 'the', 'commands', 'of', 'the', 'party']
hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
        'interested', 'in', 'world', 'history']

list_of_references = [ref1, ref2, ref3]
hypotheses = [hyp1, hyp2]
weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}

bleu = BLEU(list_of_references, weights)
bleu.get_score(hypotheses)

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Input In [1], in <cell line: 22>()
     19 hypotheses = [hyp1, hyp2]
     20 weights = {'bigram': (1/2., 1/2.), 'trigram': (1/3., 1/3., 1/3.)}
---> 22 bleu = BLEU(list_of_references, weights)
     23 bleu.get_score(hypotheses)

File ~/.local/lib/python3.9/site-packages/fast_bleu/__python_wrapper__.py:80, in BLEU.__init__(self, lines_of_tokens, weights, smoothing_func, auto_reweight, verbose)
     77 assert smoothing_func in [0, 1], 'Smoothing function only supports 0 or 1 type.'
     78 assert not (False in [abs(1. - sum(w)) < 1e-15 for w in self.__weights]
     79             ), 'All weights must sum to one.'
---> 80 self.__init_cdll()
     81 lines_of_tokens = _encode_listoflist_str(lines_of_tokens)
     83 faulthandler_enabled = faulthandler.is_enabled()

File ~/.local/lib/python3.9/site-packages/fast_bleu/__python_wrapper__.py:91, in BLEU.__init_cdll(self)
     90 def __init_cdll(self):
---> 91     self.__lib = _load_cdll()
     92     self.__get_instance = self.__lib.get_bleu_instance
     93     self.__get_score = self.__lib.get_bleu_score

File ~/.local/lib/python3.9/site-packages/fast_bleu/__python_wrapper__.py:12, in _load_cdll()
     10 import os
     11 curr_path = os.path.dirname(__file__) + '/'
---> 12 return ctypes.CDLL(curr_path + '__fast_bleu_module.so',
     13                                use_errno=True)

File ~/opt/anaconda3/lib/python3.9/ctypes/__init__.py:382, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    379 self._FuncPtr = _FuncPtr
    381 if handle is None:
--> 382     self._handle = _dlopen(self._name, mode)
    383 else:
    384     self._handle = handle

OSError: dlopen(/Users/jb/.local/lib/python3.9/site-packages/fast_bleu/__fast_bleu_module.so, 0x0006): symbol not found in flat namespace '_GOMP_loop_nonmonotonic_guided_next'

--

I installed up-to-date xcode dev tools as well as a new installation of libomp. A previous fast-bleu installation attempt yielded a different error (RuntimeError: ffi_prep_cif_var failed), but running a fresh installation of homebrew, gcc, conda, and python 3.9.12 with required dependencies produced the new error above.

What can I do to get fast-bleu running? sacrebleu is too slow for a project I'm working on.

Thanks for your help!

clang error while installing on Mac

I'm getting the following error when trying to install on my Mac environment.

โžœ  fast-bleu git:(master) pip3 install -e .            
Obtaining file:///Users/vinaydamodaran/Personal/fast-bleu
Installing collected packages: fast-bleu
  Running setup.py develop for fast-bleu
    ERROR: Command errored out with exit status 1:
     command: /usr/local/opt/python/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/Users/vinaydamodaran/Personal/fast-bleu/setup.py'"'"'; __file__='"'"'/Users/vinaydamodaran/Personal/fast-bleu/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
         cwd: /Users/vinaydamodaran/Personal/fast-bleu/
    Complete output (20 lines):
    running develop
    running egg_info
    creating fast_bleu.egg-info
    writing fast_bleu.egg-info/PKG-INFO
    writing dependency_links to fast_bleu.egg-info/dependency_links.txt
    writing top-level names to fast_bleu.egg-info/top_level.txt
    writing manifest file 'fast_bleu.egg-info/SOURCES.txt'
    reading manifest file 'fast_bleu.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'fast_bleu.egg-info/SOURCES.txt'
    running build_ext
    building 'fast_bleu.__fast_bleu_module' extension
    creating build
    creating build/temp.macosx-10.15-x86_64-3.7
    creating build/temp.macosx-10.15-x86_64-3.7/fast_bleu
    creating build/temp.macosx-10.15-x86_64-3.7/fast_bleu/cpp_sources
    creating build/temp.macosx-10.15-x86_64-3.7/fast_bleu/cpp_sources/sources
    clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -Ifast_bleu/cpp_sources/headers/ -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c fast_bleu/cpp_sources/sources/bleu.cpp -o build/temp.macosx-10.15-x86_64-3.7/fast_bleu/cpp_sources/sources/bleu.o -fopenmp -std=c++11
    clang: error: unsupported option '-fopenmp'
    error: command 'clang' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/opt/python/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/Users/vinaydamodaran/Personal/fast-bleu/setup.py'"'"'; __file__='"'"'/Users/vinaydamodaran/Personal/fast-bleu/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

Mac Version: MacOS Cataline 10.15.6

MacOS workaround instructions out of date.

Hello,

I am having a difficult time building this package.

This command: pip install --user fast-bleu --install-option="--CC=<path-to-gcc>" --install-option="--CXX=<path-to-g++>"
no longer works on recent versions of pip. I have even attempted to build the package with python setup.py build, and attempting to pass in:

CC=/usr/local/Cellar/gcc@10/10.5.0/bin/gcc-10
CXX=/usr/local/Cellar/gcc@10/10.5.0/bin/g++-10
LDFLAGS="-L/usr/local/opt/llvm/lib"
CPPFLAGS="-I/usr/local/opt/llvm/include"

I attempted using xcode-select --install to obtain the stdlib headers. I've install gcc@10 and tried 12 as well. I also downloaded llvm and libomp from brew.Not sure what else to try at this moment. I know its probably something borked in my environment, but I cannot figure it out. Any help would be fantastic.

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.