Code Monkey home page Code Monkey logo

asr-evaluation's Introduction

asr_evaluation

Build Status PyPI version codecov

Python module for evaluting ASR hypotheses (i.e. word error rate and word recognition rate).

This module depends on the editdistance project, for computing edit distances between arbitrary sequences.

The formatting of the output of this program is very loosely based around the same idea as the align.c program commonly used within the Sphinx ASR community. This may run a bit faster if neither instances nor confusions are printed.

Please let me know if you have any comments, questions, or problems.

Output

The program outputs three standard measurements:

  • Word error rate (WER)
  • Word recognition rate (the number of matched words in the alignment divided by the number of words in the reference).
  • Sentence error rate (SER) (the number of incorrect sentences divided by the total number of sentences).

Installing & uninstalling

The easiest way to install is using pip:

pip install asr-evaluation

Alternatively you can clone this git repo and install using distutils:

git clone [email protected]:belambert/asr-evaluation.git
cd asr-evaluation
python setup.py install

To uninstall with pip:

pip uninstall asr-evaluation

Command line usage

For command line usage, see:

    wer --help

It should display something like this:

usage: wer [-h] [-i | -r] [--head-ids] [-id] [-c] [-p] [-m count] [-a] [-e]
           ref hyp

Evaluate an ASR transcript against a reference transcript.

positional arguments:
  ref                   Reference transcript filename
  hyp                   ASR hypothesis filename

optional arguments:
  -h, --help            show this help message and exit
  -i, --print-instances
                        Print all individual sentences and their errors.
  -r, --print-errors    Print all individual sentences that contain errors.
  --head-ids            Hypothesis and reference files have ids in the first
                        token? (Kaldi format)
  -id, --tail-ids, --has-ids
                        Hypothesis and reference files have ids in the last
                        token? (Sphinx format)
  -c, --confusions      Print tables of which words were confused.
  -p, --print-wer-vs-length
                        Print table of average WER grouped by reference
                        sentence length.
  -m count, --min-word-count count
                        Minimum word count to show a word in confusions.
  -a, --case-insensitive
                        Down-case the text before running the evaluation.
  -e, --remove-empty-refs
                        Skip over any examples where the reference is empty.

Contributing and code of conduct

For contributions, it's best to Github issues and pull requests. Proper testing and documentation suggested.

Code of conduct is expected to be reasonable, especially as specified by the Contributor Covenant

asr-evaluation's People

Contributors

belambert avatar davidnemeskey avatar selaselah avatar shingo22 avatar sourcery-ai[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asr-evaluation's Issues

SER and WRR return different values among pip and src versions

This is a really nice project! I noticed that WRR and SER return wrong values. I installed asr-evaluation.

  • asr-evaluation==2.0.2 (pip install)

I have a simple example for reproducing the problem. I also clned this repository, and tested it directly. The returned values looks correct.

Is my usage wrong? Or, some bugs?

  • hyp.txt
i have dog
did you pen
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
  • ref.txt
i have a dog
do you have a pen
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
wer ref.txt hyp.txt                                                                                                                                                                                                              Sentence count: 13
WER:    14.286% (         4 /         28)
WRR:    96.429% (        27 /         28)
SER:   100.000% (        13 /         13)
cd asr_evaluation
python __main__.py ../../ref.txt ../../hyp.txt                                                                         
Sentence count: 13
WER:    12.903% (         4 /         31)
WRR:    87.097% (        27 /         31)
SER:    15.385% (         2 /         13)

Python API

Is there Python API that can be called from within Python?

SER wrong for simple example

It's probably me but with the following two input files:

ref.txt

the crazy frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the crazy frog jumps over the lazy dog extended

and hyp.txt

the crazy frog jumps over the lazy dog
the frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the craz frog jumps over the lazy dog
the crazy frog jumps over the lazy dog

Running the wer command:

wer ref.txt hyp.txt`

Gives me the result:

Sentence count: 5
WER:    10.000% (         4 /         40)
WRR:    92.500% (        37 /         40)
SER:    80.000% (         4 /          5)

For some reason it would seem that only 4 senteneces in hyp.txt are recognized?

Environment

$ pip show asr-evaluation
Name: asr-evaluation
Version: 2.0.2
Summary: Evaluating ASR (automatic speech recognition) hypotheses, i.e. computing word error rate.
Home-page: UNKNOWN
Author: Ben Lambert
Author-email: [email protected]
License: LICENSE.txt
Location: /home/sfalk/miniconda3/envs/t2t/lib/python3.5/site-packages
Requires: termcolor, edit-distance
Required-by: 

wer -p raises IndexError

When trying to print the table of average WER grouped by reference sentence length, I get the following error:

Traceback (most recent call last):
  File "/home/david/miniconda3/bin/wer", line 11, in <module>
    load_entry_point('asr-evaluation', 'console_scripts', 'wer')()
  File "/home/david/asr-evaluation/asr_evaluation/__main__.py", line 59, in main
    other_main(args)
  File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 82, in main
    print_wer_vs_length()
  File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 371, in print_wer_vs_length
    avg_wers = list(map(lambda x: (x[0], mean(x[1])), values))
  File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 371, in <lambda>
    avg_wers = list(map(lambda x: (x[0], mean(x[1])), values))
IndexError: list index out of range

Running Python 3.7.7.

Paper Citations

Can you name a paper on which this ASR Evaluation is based on?

thank you very much

issues about calculating WER and WRR using python2.7

@belambert
Very useful package, however, there might be a little mistake when calcualtes the WER and WRR if we use python2.7:
I think 'error_count' and 'match_count' should be convert to type 'float' if you use python2.7

asr_evaluation/asr_evaluation.py(line65-66)

print('WRR: {0:f} % ({1:10d} / {2:10d})'.format(100 * match_count / ref_token_count, match_count, ref_token_count))
print('WER: {0:f} % ({1:10d} / {2:10d})'.format(100 * error_count / ref_token_count, error_count, ref_token_count))

Anyway, it's a good package!

Sphinx format problem when using -id argument

In Sphinx format the hypothesis file has the following form:

hypothesis_text (file_id score)

while transcriptions:

transcription_text (file_id)

So when I run the wer command the following error occurs:

$ wer transcriptions hypothesis -id
Reference and hypothesis IDs do not match! ref="(data_005)" hyp="-7716)"
File lines in hyp file should match those in the ref file.

I think this occurs because you have not take into account the score parameter. It compares the file id of the transcriptions to the score instead of the file id of the hypothesis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.