Code Monkey home page Code Monkey logo

apertium-apy's Introduction

Apertium APy

Build Status Coverage Status PyPI PyPI - Python Version

Apertium APy, Apertium API in Python, is a web server exposing Apertium functions including text, document, and webpage translation, as well as morphological analysis and generation. More information is available on the Apertium Wiki.

Requirements

  • Python 3.6+
  • Tornado 4.5.3 - 6.0.4 (python3-tornado on Debian/Ubuntu)

Additional functionality is provided by installation of the following packages:

  • apertium-streamparser enables spell checking
  • requests enables suggestion handling
  • chromium_compact_language_detector enables improved language detection (cld2)
  • chardet enables website character encoding detection
  • commentjson allows to keep API keys in commented json
  • lxml enables pair preferences

Precise versions are available in requirements.txt and setup.py.

Installation

Before you install, you can try out a live version of APy at apertium.org.

APy is available through PyPi:

$ pip install apertium-apy

On Ubuntu/Debian, it is also available through apt:

$ wget -qO- https://apertium.projectjj.com/apt/install-nightly.sh | bash
$ apt-get install apertium-apy

Finally, GitHub Container Registry hosts an image of the provided Dockerfile with entry point apertium-apy exposing port 2737:

$ docker pull ghcr.io/apertium/apy

Usage

Installation through apt or pip adds an apertium-apy executable:

$ apertium-apy --help
usage: apertium-apy [-h] [-s NONPAIRS_PATH] [-l LANG_NAMES] [-F FASTTEXT_MODEL]
                  [-f MISSING_FREQS] [-p PORT] [-c SSL_CERT] [-k SSL_KEY]
                  [-t TIMEOUT] [-j [NUM_PROCESSES]] [-d] [-P LOG_PATH]
                  [-i MAX_PIPES_PER_PAIR] [-n MIN_PIPES_PER_PAIR]
                  [-u MAX_USERS_PER_PIPE] [-m MAX_IDLE_SECS]
                  [-r RESTART_PIPE_AFTER] [-v VERBOSITY] [-V] [-S]
                  [-M UNKNOWN_MEMORY_LIMIT] [-T STAT_PERIOD_MAX_AGE]
                  [-wp WIKI_PASSWORD] [-wu WIKI_USERNAME] [-b]
                  [-rs RECAPTCHA_SECRET] [-md MAX_DOC_PIPES] [-C CONFIG]
                  [-ak API_KEYS]
                  pairs_path

Apertium APY -- API server for machine translation and language analysis

positional arguments:
  pairs_path            path to Apertium installed pairs (all modes files in
                        this path are included)

options:
  -h, --help            show this help message and exit
  -s NONPAIRS_PATH, --nonpairs-path NONPAIRS_PATH
                        path to Apertium tree (only non-translator debug modes
                        are included from this path)
  -l LANG_NAMES, --lang-names LANG_NAMES
                        path to localised language names sqlite database
                        (default = langNames.db)
  -F FASTTEXT_MODEL, --fasttext-model FASTTEXT_MODEL
                        path to fastText language identification model (e.g.
                        lid.release.ftz)
  -f MISSING_FREQS, --missing-freqs MISSING_FREQS
                        path to missing word frequency sqlite database
                        (default = None)
  -p PORT, --port PORT  port to run server on (default = 2737)
  -c SSL_CERT, --ssl-cert SSL_CERT
                        path to SSL Certificate
  -k SSL_KEY, --ssl-key SSL_KEY
                        path to SSL Key File
  -t TIMEOUT, --timeout TIMEOUT
                        timeout for requests (default = 10)
  -j [NUM_PROCESSES], --num-processes [NUM_PROCESSES]
                        number of processes to run (default = 1; use 0 to run
                        one http server per core, where each http server runs
                        all available language pairs)
  -d, --daemon          daemon mode: redirects stdout and stderr to files
                        apertium-apy.log and apertium-apy.err; use with --log-
                        path
  -P LOG_PATH, --log-path LOG_PATH
                        path to log output files to in daemon mode; defaults
                        to local directory
  -i MAX_PIPES_PER_PAIR, --max-pipes-per-pair MAX_PIPES_PER_PAIR
                        how many pipelines we can spin up per language pair
                        (default = 1)
  -n MIN_PIPES_PER_PAIR, --min-pipes-per-pair MIN_PIPES_PER_PAIR
                        when shutting down pipelines, keep at least this many
                        open per language pair (default = 0)
  -u MAX_USERS_PER_PIPE, --max-users-per-pipe MAX_USERS_PER_PIPE
                        how many concurrent requests per pipeline before we
                        consider spinning up a new one (default = 5)
  -m MAX_IDLE_SECS, --max-idle-secs MAX_IDLE_SECS
                        if specified, shut down pipelines that have not been
                        used in this many seconds
  -r RESTART_PIPE_AFTER, --restart-pipe-after RESTART_PIPE_AFTER
                        restart a pipeline if it has had this many requests
                        (default = 1000)
  -v VERBOSITY, --verbosity VERBOSITY
                        logging verbosity
  -V, --version         show APY version
  -S, --scalemt-logs    generates ScaleMT-like logs; use with --log-path;
                        disables
  -M UNKNOWN_MEMORY_LIMIT, --unknown-memory-limit UNKNOWN_MEMORY_LIMIT
                        keeps unknown words in memory until a limit is
                        reached; use with --missing-freqs (default = 1000)
  -T STAT_PERIOD_MAX_AGE, --stat-period-max-age STAT_PERIOD_MAX_AGE
                        How many seconds back to keep track request timing
                        stats (default = 3600)
  -wp WIKI_PASSWORD, --wiki-password WIKI_PASSWORD
                        Apertium Wiki account password for SuggestionHandler
  -wu WIKI_USERNAME, --wiki-username WIKI_USERNAME
                        Apertium Wiki account username for SuggestionHandler
  -b, --bypass-token    ReCAPTCHA bypass token
  -rs RECAPTCHA_SECRET, --recaptcha-secret RECAPTCHA_SECRET
                        ReCAPTCHA secret for suggestion validation
  -md MAX_DOC_PIPES, --max-doc-pipes MAX_DOC_PIPES
                        how many concurrent document translation pipelines we
                        allow (default = 3)
  -C CONFIG, --config CONFIG
                        Configuration file to load options from
  -ak API_KEYS, --api-keys API_KEYS
                        Configuration file to load API keys

Contributing

APy uses GitHub Actions for continuous integration. Locally, use make test to run the same checks it does. After installing Pipenv, run pipenv install --dev to install the requirements required for development, e.g. linters.

apertium-apy's People

Contributors

akosiaris avatar androbin avatar ayushjainrksh avatar bentley avatar danielmartinez avatar dependabot[bot] avatar ftyers avatar hectoralos avatar jonorthwash avatar jpjpjpopop avatar kartikm avatar marcriera avatar ryanachi avatar shardulc avatar share-with-me avatar simonnarang avatar sushain97 avatar svineet avatar tinodidriksen avatar unhammer avatar wei2912 avatar wolfgangth avatar xavivars avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-apy's Issues

Chained translations

As discussed in apertium's IRC channel it would be nice to have chained translations, for example Sardinian -> Italian -> Spanish. So some code needs to be done that creates a path which the source text needs to go trough in order to achieve the wanted translation. Also some flag etc. that enables these chained translations.

non-deterministic output on perWord endpoint

With a single request, we're getting up to three different responses.

This is the request we're using: http://beta.apertium.org/apy/perWord?lang=eng-fra&modes=biltrans&q=hi

Depending on APy's mood, these are the outputs we get:

Correct output:

[{"biltrans": ["salut<ij>"], "input": "hi"}]

Empty output:

[{"biltrans": [], "input": "hi"}]

Error output:

{"message": "Internal Server Error", "code": 500, "status": "error", "explanation": "Internal Server Error"}

Noticed by @avyayv.

langnames calls should be non-blocking

if getLocalizedLanguages reads from a damaged disk/db, it can block other requests. It should use the same non-blocking yield pattern as translation does.

/list?q=pairs return data differs from rest of the modes

/list?q=pairs return data differs from rest of the modes. It produces:

$ curl 'http://localhost:2737/list?q=pairs'

{"responseStatus": 200, "responseData": [
 {"sourceLanguage": "kaz", "targetLanguage": "tat"}, 
 {"sourceLanguage": "tat", "targetLanguage": "kaz"}, 
 {"sourceLanguage": "mk", "targetLanguage": "en"}
], "responseDetails": null}

and the rest are like:

$ curl 'http://localhost:2737/list?q=analyzers'
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", 
 "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"}

Should we just remove the pairs mode from /list and have only /listPairs?

support unicode directly instead of using \u codes

Currently APy outputs unicode characters using \u codes, e.g. '{"vblex": ["c\u00f3rrer"], "n": ["cursa"]}' chnages to '{"vblex": ["córrer"], "n": ["cursa"]}'.

It would be more convenient if it output unicode directly, e.g. '{"vblex": ["córrer"], "n": ["cursa"]}' chnages to '{"vblex": ["córrer"], "n": ["cursa"]}'.

`Too many open files` when individual pipelines are restarted many times

INFO:root:A pipe for pair sme-sme_spell has handled 200 requests, scheduling restart
INFO:root:sme-sme_spell not in pipelines of this process
INFO:root:Starting up a new pipeline for sme-sme_spell <E2><80><A6>
ERROR:tornado.application:Future <tornado.concurrent.Future object at 0x102670ef0> exception was never retrieved: Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/servlet.py", line 995, in get
    reformat=False)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/servlet.py", line 503, in translateAndRespond
    translated = yield pipeline.translate(toTranslate, nosplit, deformat, reformat)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/translation.py", line 80, in translate
    for part in all_split]
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 828, in callback
    result_list.append(f.result())
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1069, in run
    yielded = self.gen.send(value)
  File "/Users/wwserver1/divvun/apertium-apy/translation.py", line 285, in translateNULFlush
    proc_deformat = Popen(deformat, stdin=PIPE, stdout=PIPE)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1185, in _execute_child
    errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

This is on a server where language data is updated nightly, and pipes are restarted every 200 requests / on 3600 idle secs.

Maybe APy isn't correctly closing files when restarting pipelines?

Installer

APy should be able to install itself. It already has a Makefile, but there is no install target. Alternatively, add a setup.py.

Some way to run a single command to have APy put itself into $DESTDIR/$prefix in a fully finalized state. Basically, do what make + https://github.com/TinoDidriksen/apertium-packaging/blob/master/trunk/apertium-apy/debian/apertium-apy.install does.

The result should be that I can delete my install file entirely and just use the files APy puts into $DESTDIR/$prefix.

Analyze and Generate methods fail on APy URL server

I tried with 2 requests through APy sandbox. Both of them work in localhost while both of them fail when the server is https://www.apertium.org/apy with 400 - Bad request error. Is it because APy is running some other version of code on the server?

Fails on APy URL

screenshot from 2017-08-31 21-00-31

screenshot from 2017-08-31 21-20-11

Works on localhost
screenshot from 2017-08-31 21-27-29

License

I didn't see any mention that apertium-apy is gplv3+, does this mean were stuck with gplv3 forever? Maybe those ~12 people who have contributed could be ask individually but that is usually a difficult job.

Debian package doesn't bring streamparser module

Hi all,

I was trying to install apertium-apy from nightly debian repo at apertium.projectjj.com, it installs, but when you try to start service it shows:

Feb 06 11:31:26 softaragones python3[8195]: Traceback (most recent call last):
Feb 06 11:31:26 softaragones python3[8195]: File "servlet.py", line 58, in
Feb 06 11:31:26 softaragones python3[8195]: from streamparser.streamparser import parse, known
Feb 06 11:31:26 softaragones python3[8195]: ImportError: No module named 'streamparser'
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service: main process exited, code=exited, status=1/FAILURE
Feb 06 11:31:26 softaragones systemd[1]: Unit apertium-apy.service entered failed state.
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service holdoff time over, scheduling restart.
Feb 06 11:31:26 softaragones systemd[1]: Stopping Translation server and API for Apertium...
Feb 06 11:31:26 softaragones systemd[1]: Starting Translation server and API for Apertium...
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service start request repeated too quickly, refusing to start.
Feb 06 11:31:26 softaragones systemd[1]: Failed to start Translation server and API for Apertium.
Feb 06 11:31:26 softaragones systemd[1]: Unit apertium-apy.service entered failed state.

In order to workaround that, I downloaded streamparser from https://github.com/goavki/streamparser/tree/cf67427283a5316f3a23dbc193e75603c5cbe34c and put at /usr/share/apertium-apy/ directory, after that, it seems to be started and running OK, so I supose this module is missing inside debian package an it should to be included.

Bye!

unknown word database mangles entities

sqlite3 -header /home/apy/apertium-apy/missing.db <<EOF
.mode insert
select * from missingFreqs;
EOF

gives me e.g.

INSERT INTO table(pair,token,frequency) VALUES('sme-nob','redakt&oslash',24);
INSERT INTO table(pair,token,frequency) VALUES('sme-nob','Nollii&raquo',3);

– whenever we see an html-entity, it's cut off at the ; so we don't know what the rest of the word is.

"Stream is closed" 500 error

apy_1         | [E 171227 22:37:04 web:1590] Uncaught exception GET /translateChain?q=house&markUnknown=no&langpairs=eng%7Crus&callback=_jqjsp&_1514414224113= (76.30.93.54)
apy_1         |     HTTPServerRequest(protocol='http', host='beta.apertium.org:2738', method='GET', uri='/translateChain?q=house&markUnknown=no&langpairs=eng%7Crus&callback=_jqjsp&_1514414224113=', version='HTTP/1.1', remote_ip='76.30.93.54', headers={'Accept': '*/*', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36', 'Connection': 'keep-alive', 'Dnt': '1', 'Referer': 'http://beta.apertium.org/index.eng.html?dir=eng-rus&q=house', 'Host': 'beta.apertium.org:2738', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'en-US,en;q=0.9'})
apy_1         |     Traceback (most recent call last):
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1511, in _execute
apy_1         |         result = yield result
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/servlet.py", line 578, in get
apy_1         |         nosplit=False, deformat=deformat, reformat=reformat)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/servlet.py", line 543, in translateAndRespond
apy_1         |         translated = yield translation.coreduce(toTranslate, [p.translate for p in pipelines], nosplit, deformat, reformat)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/translation.py", line 258, in coreduce
apy_1         |         result = yield funcs[0](init, *args)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/translation.py", line 74, in translate
apy_1         |         for part in all_split]
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 828, in callback
apy_1         |         result_list.append(f.result())
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1069, in run
apy_1         |         yielded = self.gen.send(value)
apy_1         |       File "/root/apertium-apy/translation.py", line 279, in translateNULFlush
apy_1         |         proc_in.stdin.write(bytes('\0', "utf-8"))
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 387, in write
apy_1         |         self._check_closed()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 925, in _check_closed
apy_1         |         raise StreamClosedError(real_error=self.error)
apy_1         |     tornado.iostream.StreamClosedError: Stream is closed

Support multiple q's

We used to (?) support multiple q's per query (or maybe scalemt did). People are expecting us to, anyway:

Hello,

today I discovered that apertium is sending wrong response for batch interface (several q=).
For this:
https://www.apertium.org/apy/translate?format=html&markUnknown=no&langpair=en%7Ces&q=seconds+with&q=queries.+Memory+Usage%3A&q=8.12+MB

I get:
{"responseStatus": 200, "responseDetails": null, "responseData": {"translatedText": "8.12 MB"}}

ONLY LAST PART IS TRANSLATED!!! And according to this information:
http://wiki.apertium.org/wiki/Apertium_scalable_service
Each part should be translated.

Please fix it or inform ke does batch interface changed. Anyway please send me some info about sitiation :)

Cheers
Michał Podbielski

Need of toro.py in the source code

Could we just delete toro.py from apertium apy source code and add to the wiki instructions to install it with pip or via operating system's package manager? Or is it possible to move completely to tornado's locks so that we wouldn't even need toro?

French--Catalan not working

From @TinoDidriksen:

Need the APy people to look at this.

APy yields 500 Internal Server Error:
https://www.apertium.org/apy/translate?q=250%20ans%20d%E2%80%99histoire,%20un%20mod%C3%A8le%20p%C3%A9dagogique%20unique,%20une%20ouverture%20sur%20le%20monde%20avec%20de%20nombreux%20partenariats%20avec%20les%20meilleures%20universit%C3%A9s,%20des%20institutions%20culturelles%20de%20premier%20plan%20et%20les%20entreprises%20les%20plus%20innovantes.&langpair=fra|cat

But this works:
echo "250 ans d'histoire, un modèle pédagogique unique, une ouverture sur le monde avec de nombreux partenariats avec les meilleures universités, des institutions culturelles de premier plan et les entreprises les plus innovantes." | sh ~apertium/tarballs-build/share/apertium/modes/fra-cat.mode -g
250 anys d'història, un model pedagògic únic, una obertura sobre el món amb moltes cooperacions amb les millors universitats, de les institucions culturals de primer plànol i les empreses les més *innovantes.

APy log: http://codepad.org/cjtbFI8g

Debian package

It would be great if we could have a deb package for installing apertium apy.

bool sent to getPipeline instead of pair somewhere

[W 160403 06:25:57 web:1908] 400 POST /translate (127.0.0.1) 0.59ms
[E 160403 06:25:57 concurrent:319] Future <tornado.concurrent.Future object at 0x7fd4a331b438> exception was never retrieved: Traceback (most recent call last):
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 230, in wrapper
        yielded = next(result)
      File "/home/apertium/apertium-apy/servlet.py", line 385, in get
        pipeline = self.getPipeline(pair)
      File "/home/apertium/apertium-apy/servlet.py", line 324, in getPipeline
        (l1, l2) = pair
    TypeError: 'bool' object is not iterable

ChangeLog

Wikimedia Foundation is using apertium-apy in production environment (Thanks for all help!). It will be great to have ChangeLog or Summary on what has been changed between two releases.

gradual type checking with mypy

It'd be nice to have some type checking. I just made a silly mistake when renaming a variable (sending the module named html into re.sub instead of the html-contents variable …), which a good type checker would've caught right away. Python can't do very good type checking, but at least some type checking is possible with mypy, and mypy --check-untyped-defs would've caught this.

We'd have to do any type annotations in comments instead of the nice inline PEP484 syntax, since we want to support older Python versions.

We might have to only enable it for newer python versions in travis.

We might have to include some stubs for modules (servlet.py:55: error: Cannot find module named 'cld2full'), or tell mypy to ignore some things (translation.py:10: error: Name 'locks' already defined because we fallback to toro.locks in a try-except) or fill in for some bugs in mypy (translation.py:12: error: Module 'select' has no attribute 'PIPE_BUF').

cannot translate

I'm new to apertium-apy. I install this way in ubuntu server xenial (just for english and spanish):

wget http://apertium.projectjj.com/apt/install-nightly.sh
sudo bash install-nightly.sh
sudo apt-get install apertium-apy
sudo apt-get install apertium-en-es
sudo systemctl start apertium-apy.service
sudo systemctl enable apertium-apy.service

The service is running, and with this
curl http://localhost:2737/listPairs
I get
{"responseStatus": 200, "responseDetails": null, "responseData": [{"targetLanguage": "eng_US", "sourceLanguage": "spa"}, {"targetLanguage": "eng", "sourceLanguage": "spa"}, {"targetLanguage": "spa", "sourceLanguage": "eng"}]}

But I cannot translate anything, for example,
curl 'http://localhost:2737/translate?langpair=eng|spa&q=this+is+a+test'
{"message": "Internal Server Error", "status": "error", "explanation": "Internal Server Error", "code": 500}

What I'm doing wrong? I'm trying to use in combination with ContentTranslation mediawiki extension, and I need to be sure that it works from command line. Thanks in advance

web pages: better response on 404's or bad SSL

e.g. "https://www.avvir.no" gives Feb 01 15:34:44 gtweb.uit.no python3[14767]: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645) in the log, but simply "translation not available" in html-tools.

  • better JSON response from apy
  • have html-tools give that response instead of "translation not available"

Default locale to None

In in ListLanguageNamesHandler in servlet.py should we change the localeArg = self.get_argument('locale') to be like localeArg = self.get_argument('locale', default=None)? Without the defaulting to None, I don't see why the code elif 'Accept-Language' in self.request.headers: is useful (in order to use the header field there still needs to be the "locale" parameter but with no value).

Tests and coverage

I think only some of the endpoints are currently tested.

  • all the endpoints should be tested
  • tests should use standard libraries (in Python)
  • coverage stats should be produced
  • Travis should enforce coverage
  • should test non-blocking nature of the endpoints

keep track of all flushing-processes started in apy?

(moved from https://sourceforge.net/p/apertium/tickets/78/ )

Currently, we only keep track of the pids of the "ends" of flushing pipelines (inpipe, outpipe), not the whole group.

If part of a language pair pipeline crashes on startup, the rest of the pipeline may simply hang (and leave the request hanging). Of course this might mean there's a serious bug in the core tools (or language data was compiled for the wrong version), but it'd be nice if APY were more robust here.

One possible way of mitigating this might be to store pids of all started procs in the FlushingPipeline object, and then in the cleanup function go through pipelines_holding and stop all procs of any pipeline which hasn't been used in over a minute.

Unused port variable in servlet.py

servlet.py has this function:

def setupHandler(
    port, pairs_path, nonpairs_path, langNames, missingFreqsPath, timeout,
    max_pipes_per_pair, min_pipes_per_pair, max_users_per_pipe, max_idle_secs,
    restart_pipe_after, max_doc_pipes, verbosity=0, scaleMtLogs=False, memory=1000
):

and there is the port parameter. It is unused so it could removed.

log (full) failed requests to file

We might want to log all failing requests, e.g. 500's and such, to a file. The GET requests just go to output by tornado (collected by journald), but not POST.

Does flushing some times get "out of sync"?

failing sanity-tests run about one hour apart give

exit code 1: .cat-fra: expected 'pour', got 'chaud' (for input: per)
exit code 1: ...................cat-fra: expected 'pour', got 'Éternels' (for input: per)
exit code 1: ............................................................cat-fra: expected 'pour', got '*carcater' (for input: per)
exit code 1:
exit code 1: ........................................................cat-fra: expected 'pour', got 'boulangerie' (for input: per)
exit code 1:
exit code 1: ..........................................cat-fra: expected 'pour', got 'relâcher' (for input: per)
exit code 1: .......................cat-fra: expected 'pour', got '*Cotilló' (for input: per)
exit code 1: ............cat-fra: expected 'pour', got 'rainure' (for input: per)
exit code 1: .................................................cat-fra: expected 'pour', got 'approuvez' (for input: per)

as if the output of sending "per" into apy's cat-fra was from the input of an earlier request.

(That seems quite bad)

a little cache for web page translation?

Some normal pages take up to 30s to translate with the giellatekno pairs.

Maybe we should try having a little
{ pair : { url : translationOutput } }
? Just for the slashdot effect where one page gets requested again and again.

Seeing as caching is Hard, we could keep it simple and just empty it when sys.getsizeof > max_url_cache. But sys.getsizeof doesn't recursively follow the size of the object; also, news sites etc. change, so I added a stupid simple "empty if older than two hours" rule for the whole cache object.

TODO:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.