apertium / apertium-apy Goto Github PK

View Code? Open in Web Editor NEW

32.0 15.0 42.0 15.38 MB

📦 Apertium HTTP Server in Python

Home Page: https://wiki.apertium.org/wiki/Apertium-apy

License: GNU General Public License v3.0

Makefile 0.51% Python 88.63% Shell 5.12% Emacs Lisp 0.16% HTML 2.68% Dockerfile 0.71% Roff 2.04% Nix 0.15%

apertium mt machine-translation server apertium-tools

apertium-apy's Introduction

Apertium APy

Apertium APy, Apertium API in Python, is a web server exposing Apertium functions including text, document, and webpage translation, as well as morphological analysis and generation. More information is available on the Apertium Wiki.

Requirements

Python 3.6+
Tornado 4.5.3 - 6.0.4 (python3-tornado on Debian/Ubuntu)

Additional functionality is provided by installation of the following packages:

apertium-streamparser enables spell checking
requests enables suggestion handling
chromium_compact_language_detector enables improved language detection (cld2)
chardet enables website character encoding detection
commentjson allows to keep API keys in commented json
lxml enables pair preferences

Precise versions are available in requirements.txt and setup.py.

Installation

Before you install, you can try out a live version of APy at apertium.org.

APy is available through PyPi:

$ pip install apertium-apy

On Ubuntu/Debian, it is also available through apt:

$ wget -qO- https://apertium.projectjj.com/apt/install-nightly.sh | bash
$ apt-get install apertium-apy

Finally, GitHub Container Registry hosts an image of the provided Dockerfile with entry point apertium-apy exposing port 2737:

$ docker pull ghcr.io/apertium/apy

Usage

Installation through apt or pip adds an apertium-apy executable:

$ apertium-apy --help
usage: apertium-apy [-h] [-s NONPAIRS_PATH] [-l LANG_NAMES] [-F FASTTEXT_MODEL]
                  [-f MISSING_FREQS] [-p PORT] [-c SSL_CERT] [-k SSL_KEY]
                  [-t TIMEOUT] [-j [NUM_PROCESSES]] [-d] [-P LOG_PATH]
                  [-i MAX_PIPES_PER_PAIR] [-n MIN_PIPES_PER_PAIR]
                  [-u MAX_USERS_PER_PIPE] [-m MAX_IDLE_SECS]
                  [-r RESTART_PIPE_AFTER] [-v VERBOSITY] [-V] [-S]
                  [-M UNKNOWN_MEMORY_LIMIT] [-T STAT_PERIOD_MAX_AGE]
                  [-wp WIKI_PASSWORD] [-wu WIKI_USERNAME] [-b]
                  [-rs RECAPTCHA_SECRET] [-md MAX_DOC_PIPES] [-C CONFIG]
                  [-ak API_KEYS]
                  pairs_path

Apertium APY -- API server for machine translation and language analysis

positional arguments:
  pairs_path            path to Apertium installed pairs (all modes files in
                        this path are included)

options:
  -h, --help            show this help message and exit
  -s NONPAIRS_PATH, --nonpairs-path NONPAIRS_PATH
                        path to Apertium tree (only non-translator debug modes
                        are included from this path)
  -l LANG_NAMES, --lang-names LANG_NAMES
                        path to localised language names sqlite database
                        (default = langNames.db)
  -F FASTTEXT_MODEL, --fasttext-model FASTTEXT_MODEL
                        path to fastText language identification model (e.g.
                        lid.release.ftz)
  -f MISSING_FREQS, --missing-freqs MISSING_FREQS
                        path to missing word frequency sqlite database
                        (default = None)
  -p PORT, --port PORT  port to run server on (default = 2737)
  -c SSL_CERT, --ssl-cert SSL_CERT
                        path to SSL Certificate
  -k SSL_KEY, --ssl-key SSL_KEY
                        path to SSL Key File
  -t TIMEOUT, --timeout TIMEOUT
                        timeout for requests (default = 10)
  -j [NUM_PROCESSES], --num-processes [NUM_PROCESSES]
                        number of processes to run (default = 1; use 0 to run
                        one http server per core, where each http server runs
                        all available language pairs)
  -d, --daemon          daemon mode: redirects stdout and stderr to files
                        apertium-apy.log and apertium-apy.err; use with --log-
                        path
  -P LOG_PATH, --log-path LOG_PATH
                        path to log output files to in daemon mode; defaults
                        to local directory
  -i MAX_PIPES_PER_PAIR, --max-pipes-per-pair MAX_PIPES_PER_PAIR
                        how many pipelines we can spin up per language pair
                        (default = 1)
  -n MIN_PIPES_PER_PAIR, --min-pipes-per-pair MIN_PIPES_PER_PAIR
                        when shutting down pipelines, keep at least this many
                        open per language pair (default = 0)
  -u MAX_USERS_PER_PIPE, --max-users-per-pipe MAX_USERS_PER_PIPE
                        how many concurrent requests per pipeline before we
                        consider spinning up a new one (default = 5)
  -m MAX_IDLE_SECS, --max-idle-secs MAX_IDLE_SECS
                        if specified, shut down pipelines that have not been
                        used in this many seconds
  -r RESTART_PIPE_AFTER, --restart-pipe-after RESTART_PIPE_AFTER
                        restart a pipeline if it has had this many requests
                        (default = 1000)
  -v VERBOSITY, --verbosity VERBOSITY
                        logging verbosity
  -V, --version         show APY version
  -S, --scalemt-logs    generates ScaleMT-like logs; use with --log-path;
                        disables
  -M UNKNOWN_MEMORY_LIMIT, --unknown-memory-limit UNKNOWN_MEMORY_LIMIT
                        keeps unknown words in memory until a limit is
                        reached; use with --missing-freqs (default = 1000)
  -T STAT_PERIOD_MAX_AGE, --stat-period-max-age STAT_PERIOD_MAX_AGE
                        How many seconds back to keep track request timing
                        stats (default = 3600)
  -wp WIKI_PASSWORD, --wiki-password WIKI_PASSWORD
                        Apertium Wiki account password for SuggestionHandler
  -wu WIKI_USERNAME, --wiki-username WIKI_USERNAME
                        Apertium Wiki account username for SuggestionHandler
  -b, --bypass-token    ReCAPTCHA bypass token
  -rs RECAPTCHA_SECRET, --recaptcha-secret RECAPTCHA_SECRET
                        ReCAPTCHA secret for suggestion validation
  -md MAX_DOC_PIPES, --max-doc-pipes MAX_DOC_PIPES
                        how many concurrent document translation pipelines we
                        allow (default = 3)
  -C CONFIG, --config CONFIG
                        Configuration file to load options from
  -ak API_KEYS, --api-keys API_KEYS
                        Configuration file to load API keys

Contributing

APy uses GitHub Actions for continuous integration. Locally, use make test to run the same checks it does. After installing Pipenv, run pipenv install --dev to install the requirements required for development, e.g. linters.

apertium-apy's People

Contributors

Stargazers

Watchers

Forkers

svineet ethanachi softaragones ranjan019 wolfgangth unasettimana shardulc jatinluthra14 share-with-me techievena ryanachi jpjpjpopop aditya-prayaga alirizwi robnightingale gkkulik kartikm bentley taruen no-one-21 ayushjainrksh jameskainer mohit2399 droid95 samrat2825 marcriera jdmasa akosiaris nynodata uniyalabhishek overpowereddev thehanimo faizalam vaibhavmishra78 xdragon2002 hazemk537 swayam0322 juanpabl joanluc gopalm-3 premnanda301 satti-hari-krishna-reddy

apertium-apy's Issues

Include a PEP8 linter via Hound/Travis

Presumably, this will require fixing a bunch of existing problems, heh.

Chained translations

As discussed in apertium's IRC channel it would be nice to have chained translations, for example Sardinian -> Italian -> Spanish. So some code needs to be done that creates a path which the source text needs to go trough in order to achieve the wanted translation. Also some flag etc. that enables these chained translations.

non-deterministic output on perWord endpoint

With a single request, we're getting up to three different responses.

This is the request we're using: http://beta.apertium.org/apy/perWord?lang=eng-fra&modes=biltrans&q=hi

Depending on APy's mood, these are the outputs we get:

Correct output:

[{"biltrans": ["salut<ij>"], "input": "hi"}]

Empty output:

[{"biltrans": [], "input": "hi"}]

Error output:

{"message": "Internal Server Error", "code": 500, "status": "error", "explanation": "Internal Server Error"}

Noticed by @avyayv.

make a swagger spec

Useful for testing: https://phabricator.wikimedia.org/T108798#2011440

More examples: http://petstore.swagger.io/ / http://petstore.swagger.io/v2/swagger.json

Document translation DOC/XLS/PPT support

Document translation doesn't currently support the old Windows formats. Perhaps there's a quick Python plugin that we can use to perform a conversion on the fly back and forth. This would be best as an enable-able feature so that the library doesn't have to be installed

migrated from https://sourceforge.net/p/apertium/tickets/7/

Chained translation accumulates unknown word marks

e.g.

meow (en->es) *meow
*meow (es->fr) **meow

meow (en->fr) **meow

instead of

meow (en->fr) *meow

friendlier 400 msgs

http://apy.projectjj.com/beta/translate?langpair=ita gives {"code": 400, "message": "Bad Request", "status": "error", "explanation": ""} which isn't much of an explanation

langnames calls should be non-blocking

if getLocalizedLanguages reads from a damaged disk/db, it can block other requests. It should use the same non-blocking yield pattern as translation does.

/list?q=pairs return data differs from rest of the modes

/list?q=pairs return data differs from rest of the modes. It produces:

$ curl 'http://localhost:2737/list?q=pairs'

{"responseStatus": 200, "responseData": [
 {"sourceLanguage": "kaz", "targetLanguage": "tat"}, 
 {"sourceLanguage": "tat", "targetLanguage": "kaz"}, 
 {"sourceLanguage": "mk", "targetLanguage": "en"}
], "responseDetails": null}

and the rest are like:

$ curl 'http://localhost:2737/list?q=analyzers'
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", 
 "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"}

Should we just remove the pairs mode from /list and have only /listPairs?

perWord example on Html-tools (and on wiki) fails with bad request error

support unicode directly instead of using \u codes

Currently APy outputs unicode characters using \u codes, e.g. '{"vblex": ["c\u00f3rrer"], "n": ["cursa"]}' chnages to '{"vblex": ["córrer"], "n": ["cursa"]}'.

It would be more convenient if it output unicode directly, e.g. '{"vblex": ["córrer"], "n": ["cursa"]}' chnages to '{"vblex": ["córrer"], "n": ["cursa"]}'.

calcCoverage example on wiki page fails

getCoverage may have to be replaced with calcCoverage

the example on wiki page fails with 400- bad request error.

`Too many open files` when individual pipelines are restarted many times

INFO:root:A pipe for pair sme-sme_spell has handled 200 requests, scheduling restart
INFO:root:sme-sme_spell not in pipelines of this process
INFO:root:Starting up a new pipeline for sme-sme_spell <E2><80><A6>
ERROR:tornado.application:Future <tornado.concurrent.Future object at 0x102670ef0> exception was never retrieved: Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/servlet.py", line 995, in get
    reformat=False)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/servlet.py", line 503, in translateAndRespond
    translated = yield pipeline.translate(toTranslate, nosplit, deformat, reformat)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/wwserver1/divvun/apertium-apy/translation.py", line 80, in translate
    for part in all_split]
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 828, in callback
    result_list.append(f.result())
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tornado/gen.py", line 1069, in run
    yielded = self.gen.send(value)
  File "/Users/wwserver1/divvun/apertium-apy/translation.py", line 285, in translateNULFlush
    proc_deformat = Popen(deformat, stdin=PIPE, stdout=PIPE)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1185, in _execute_child
    errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

This is on a server where language data is updated nightly, and pipes are restarted every 200 requests / on 3600 idle secs.

Maybe APy isn't correctly closing files when restarting pipelines?

Installer

APy should be able to install itself. It already has a Makefile, but there is no install target. Alternatively, add a setup.py.

Some way to run a single command to have APy put itself into $DESTDIR/$prefix in a fully finalized state. Basically, do what make + https://github.com/TinoDidriksen/apertium-packaging/blob/master/trunk/apertium-apy/debian/apertium-apy.install does.

The result should be that I can delete my install file entirely and just use the files APy puts into $DESTDIR/$prefix.

Multiple q parameters

@unhammer:

ScaleMT support's multiple &q parameters. This would make it possible for clients to buffer up some requests, while ensuring each request does not interfere with the other (to the extent that apy itself ensures this through ensuring NUL flushing between value of q).

migrated from https://sourceforge.net/p/apertium/tickets/72/.

Analyze and Generate methods fail on APy URL server

I tried with 2 requests through APy sandbox. Both of them work in localhost while both of them fail when the server is https://www.apertium.org/apy with 400 - Bad request error. Is it because APy is running some other version of code on the server?

Fails on APy URL

Works on localhost

License

I didn't see any mention that apertium-apy is gplv3+, does this mean were stuck with gplv3 forever? Maybe those ~12 people who have contributed could be ask individually but that is usually a difficult job.

Debian package doesn't bring streamparser module

Hi all,

I was trying to install apertium-apy from nightly debian repo at apertium.projectjj.com, it installs, but when you try to start service it shows:

Feb 06 11:31:26 softaragones python3[8195]: Traceback (most recent call last):
Feb 06 11:31:26 softaragones python3[8195]: File "servlet.py", line 58, in
Feb 06 11:31:26 softaragones python3[8195]: from streamparser.streamparser import parse, known
Feb 06 11:31:26 softaragones python3[8195]: ImportError: No module named 'streamparser'
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service: main process exited, code=exited, status=1/FAILURE
Feb 06 11:31:26 softaragones systemd[1]: Unit apertium-apy.service entered failed state.
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service holdoff time over, scheduling restart.
Feb 06 11:31:26 softaragones systemd[1]: Stopping Translation server and API for Apertium...
Feb 06 11:31:26 softaragones systemd[1]: Starting Translation server and API for Apertium...
Feb 06 11:31:26 softaragones systemd[1]: apertium-apy.service start request repeated too quickly, refusing to start.
Feb 06 11:31:26 softaragones systemd[1]: Failed to start Translation server and API for Apertium.
Feb 06 11:31:26 softaragones systemd[1]: Unit apertium-apy.service entered failed state.

In order to workaround that, I downloaded streamparser from https://github.com/goavki/streamparser/tree/cf67427283a5316f3a23dbc193e75603c5cbe34c and put at /usr/share/apertium-apy/ directory, after that, it seems to be started and running OK, so I supose this module is missing inside debian package an it should to be included.

Bye!

unknown word database mangles entities

sqlite3 -header /home/apy/apertium-apy/missing.db <<EOF
.mode insert
select * from missingFreqs;
EOF

gives me e.g.

INSERT INTO table(pair,token,frequency) VALUES('sme-nob','redakt&oslash',24);
INSERT INTO table(pair,token,frequency) VALUES('sme-nob','Nollii&raquo',3);

– whenever we see an html-entity, it's cut off at the ; so we don't know what the rest of the word is.

python3.5 no longer likes raise StopIteration from generators

https://docs.python.org/3/whatsnew/3.5.html#pep-479-change-stopiteration-handling-inside-generators

324a185 changed the return from generator into raise StopIteration to make it compatible with older python3's, but this makes it incompatible with 3.5 and newer, see https://travis-ci.org/goavki/apertium-apy/jobs/102092337 for the error message

"Stream is closed" 500 error

apy_1         | [E 171227 22:37:04 web:1590] Uncaught exception GET /translateChain?q=house&markUnknown=no&langpairs=eng%7Crus&callback=_jqjsp&_1514414224113= (76.30.93.54)
apy_1         |     HTTPServerRequest(protocol='http', host='beta.apertium.org:2738', method='GET', uri='/translateChain?q=house&markUnknown=no&langpairs=eng%7Crus&callback=_jqjsp&_1514414224113=', version='HTTP/1.1', remote_ip='76.30.93.54', headers={'Accept': '*/*', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36', 'Connection': 'keep-alive', 'Dnt': '1', 'Referer': 'http://beta.apertium.org/index.eng.html?dir=eng-rus&q=house', 'Host': 'beta.apertium.org:2738', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'en-US,en;q=0.9'})
apy_1         |     Traceback (most recent call last):
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1511, in _execute
apy_1         |         result = yield result
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/servlet.py", line 578, in get
apy_1         |         nosplit=False, deformat=deformat, reformat=reformat)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/servlet.py", line 543, in translateAndRespond
apy_1         |         translated = yield translation.coreduce(toTranslate, [p.translate for p in pipelines], nosplit, deformat, reformat)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/translation.py", line 258, in coreduce
apy_1         |         result = yield funcs[0](init, *args)
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1063, in run
apy_1         |         yielded = self.gen.throw(*exc_info)
apy_1         |       File "/root/apertium-apy/translation.py", line 74, in translate
apy_1         |         for part in all_split]
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1055, in run
apy_1         |         value = future.result()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 828, in callback
apy_1         |         result_list.append(f.result())
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/concurrent.py", line 238, in result
apy_1         |         raise_exc_info(self._exc_info)
apy_1         |       File "<string>", line 4, in raise_exc_info
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1069, in run
apy_1         |         yielded = self.gen.send(value)
apy_1         |       File "/root/apertium-apy/translation.py", line 279, in translateNULFlush
apy_1         |         proc_in.stdin.write(bytes('\0', "utf-8"))
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 387, in write
apy_1         |         self._check_closed()
apy_1         |       File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 925, in _check_closed
apy_1         |         raise StreamClosedError(real_error=self.error)
apy_1         |     tornado.iostream.StreamClosedError: Stream is closed

Support multiple q's

We used to (?) support multiple q's per query (or maybe scalemt did). People are expecting us to, anyway:

Hello,

today I discovered that apertium is sending wrong response for batch interface (several q=).
For this:
https://www.apertium.org/apy/translate?format=html&markUnknown=no&langpair=en%7Ces&q=seconds+with&q=queries.+Memory+Usage%3A&q=8.12+MB

I get:
{"responseStatus": 200, "responseDetails": null, "responseData": {"translatedText": "8.12 MB"}}

ONLY LAST PART IS TRANSLATED!!! And according to this information:
http://wiki.apertium.org/wiki/Apertium_scalable_service
Each part should be translated.

Please fix it or inform ke does batch interface changed. Anyway please send me some info about sitiation :)

Cheers
Michał Podbielski

Need of toro.py in the source code

Could we just delete toro.py from apertium apy source code and add to the wiki instructions to install it with pip or via operating system's package manager? Or is it possible to move completely to tornado's locks so that we wouldn't even need toro?

French--Catalan not working

From @TinoDidriksen:

Need the APy people to look at this.

APy yields 500 Internal Server Error:
https://www.apertium.org/apy/translate?q=250%20ans%20d%E2%80%99histoire,%20un%20mod%C3%A8le%20p%C3%A9dagogique%20unique,%20une%20ouverture%20sur%20le%20monde%20avec%20de%20nombreux%20partenariats%20avec%20les%20meilleures%20universit%C3%A9s,%20des%20institutions%20culturelles%20de%20premier%20plan%20et%20les%20entreprises%20les%20plus%20innovantes.&langpair=fra|cat

But this works:
echo "250 ans d'histoire, un modèle pédagogique unique, une ouverture sur le monde avec de nombreux partenariats avec les meilleures universités, des institutions culturelles de premier plan et les entreprises les plus innovantes." | sh ~apertium/tarballs-build/share/apertium/modes/fra-cat.mode -g
250 anys d'història, un model pedagògic únic, una obertura sobre el món amb moltes cooperacions amb les millors universitats, de les institucions culturals de primer plànol i les empreses les més *innovantes.

APy log: http://codepad.org/cjtbFI8g

Debian package

It would be great if we could have a deb package for installing apertium apy.

bool sent to getPipeline instead of pair somewhere

[W 160403 06:25:57 web:1908] 400 POST /translate (127.0.0.1) 0.59ms
[E 160403 06:25:57 concurrent:319] Future <tornado.concurrent.Future object at 0x7fd4a331b438> exception was never retrieved: Traceback (most recent call last):
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 230, in wrapper
        yielded = next(result)
      File "/home/apertium/apertium-apy/servlet.py", line 385, in get
        pipeline = self.getPipeline(pair)
      File "/home/apertium/apertium-apy/servlet.py", line 324, in getPipeline
        (l1, l2) = pair
    TypeError: 'bool' object is not iterable

Support for translation memories

Re: https://www.mediawiki.org/w/index.php?title=Topic:T13rue5gsz3gjqrx&topic_showPostId=t13w8lgm4bxsxk3w&fromnotif=1#flow-post-t13w8lgm4bxsxk3w
it'd be cool to support TMX-files as with the -m option in Apertium, and even cooler if we could integrate https://github.com/goavki/patcher

apy / identifyLang returns code 400

Requests to https://www.apertium.orf/apy/identifyLang produce code 400 errors.

.(null) when translating HTML-file

When trying to translate this file from Norsk Bokmål (nob) to Nynorsk (nno), I only get part of the file back.

The section I get back begins and ends with ".(null)"

Any way to pre-process the file to get around this?

Using https://www.apertium.org/apy/translateDoc?langpair=nob|nno&markUnknown=no

ChangeLog

Wikimedia Foundation is using apertium-apy in production environment (Thanks for all help!). It will be great to have ChangeLog or Summary on what has been changed between two releases.

End Python 3.2 support (must cut a release + modify changelog)

Python 3.2's flake and mypy malfunction. This makes our build process and imports somewhat nasty (introduced in #69).

Should we stop supporting Py3.2?

cc @unhammer

gradual type checking with mypy

It'd be nice to have some type checking. I just made a silly mistake when renaming a variable (sending the module named html into re.sub instead of the html-contents variable …), which a good type checker would've caught right away. Python can't do very good type checking, but at least some type checking is possible with mypy, and mypy --check-untyped-defs would've caught this.

We'd have to do any type annotations in comments instead of the nice inline PEP484 syntax, since we want to support older Python versions.

We might have to only enable it for newer python versions in travis.

We might have to include some stubs for modules (servlet.py:55: error: Cannot find module named 'cld2full'), or tell mypy to ignore some things (translation.py:10: error: Name 'locks' already defined because we fallback to toro.locks in a try-except) or fill in for some bugs in mypy (translation.py:12: error: Module 'select' has no attribute 'PIPE_BUF').

Possible to keep pipes open with web page translation?

Keeping pipelines open when translating requires splitting requests into pieces that are <pipebuf in size, but this messes up HTML. Currently (e.g. in branch giellatekno 3dd7a76 ) we just start new pipelines per page request.

Document translation should spawn processes

@unhammer:

We probably shouldn't do threading since it makes python eat all our swap :-) It should be simple enough to start the process using tornado.process.Subprocess in a @gen.coroutine using "yield", similar to translateSimple()

migrated from https://sourceforge.net/p/apertium/tickets/7/

Document translation not working in Safari?

Safari users report documents getting back all garbled

tull_test.txt

cannot translate

I'm new to apertium-apy. I install this way in ubuntu server xenial (just for english and spanish):

wget http://apertium.projectjj.com/apt/install-nightly.sh
sudo bash install-nightly.sh
sudo apt-get install apertium-apy
sudo apt-get install apertium-en-es
sudo systemctl start apertium-apy.service
sudo systemctl enable apertium-apy.service

The service is running, and with this
curl http://localhost:2737/listPairs
I get
{"responseStatus": 200, "responseDetails": null, "responseData": [{"targetLanguage": "eng_US", "sourceLanguage": "spa"}, {"targetLanguage": "eng", "sourceLanguage": "spa"}, {"targetLanguage": "spa", "sourceLanguage": "eng"}]}

But I cannot translate anything, for example,
curl 'http://localhost:2737/translate?langpair=eng|spa&q=this+is+a+test'
{"message": "Internal Server Error", "status": "error", "explanation": "Internal Server Error", "code": 500}

What I'm doing wrong? I'm trying to use in combination with ContentTranslation mediawiki extension, and I need to be sure that it works from command line. Thanks in advance

web pages: better response on 404's or bad SSL

e.g. "https://www.avvir.no" gives Feb 01 15:34:44 gtweb.uit.no python3[14767]: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645) in the log, but simply "translation not available" in html-tools.

better JSON response from apy
have html-tools give that response instead of "translation not available"

Default locale to None

In in ListLanguageNamesHandler in servlet.py should we change the localeArg = self.get_argument('locale') to be like localeArg = self.get_argument('locale', default=None)? Without the defaulting to None, I don't see why the code elif 'Accept-Language' in self.request.headers: is useful (in order to use the header field there still needs to be the "locale" parameter but with no value).

Tests and coverage

I think only some of the endpoints are currently tested.

all the endpoints should be tested
tests should use standard libraries (in Python)
coverage stats should be produced
Travis should enforce coverage
should test non-blocking nature of the endpoints

keep track of all flushing-processes started in apy?

(moved from https://sourceforge.net/p/apertium/tickets/78/ )

Currently, we only keep track of the pids of the "ends" of flushing pipelines (inpipe, outpipe), not the whole group.

If part of a language pair pipeline crashes on startup, the rest of the pipeline may simply hang (and leave the request hanging). Of course this might mean there's a serious bug in the core tools (or language data was compiled for the wrong version), but it'd be nice if APY were more robust here.

One possible way of mitigating this might be to store pids of all started procs in the FlushingPipeline object, and then in the cleanup function go through pipelines_holding and stop all procs of any pipeline which hasn't been used in over a minute.

Document translation limits

@unhammer:

We might want to have some locks or a similar method of ensuring we don't start more than N processes at a time (not likely to happen in normal usage, more to protect against malicious users), could have a dict of doc_locks just like pipeline_locks.

migrated from https://sourceforge.net/p/apertium/tickets/7/

identifyLang handler gives error for url kind of inputs

I am unsure of what other kind of inputs would fail. I tried http://facebook.com as query for identifyLang and it gives Internal Server error.

Unused port variable in servlet.py

servlet.py has this function:

def setupHandler(
    port, pairs_path, nonpairs_path, langNames, missingFreqsPath, timeout,
    max_pipes_per_pair, min_pipes_per_pair, max_users_per_pipe, max_idle_secs,
    restart_pipe_after, max_doc_pipes, verbosity=0, scaleMtLogs=False, memory=1000
):

and there is the port parameter. It is unused so it could removed.

log (full) failed requests to file

We might want to log all failing requests, e.g. 500's and such, to a file. The GET requests just go to output by tornado (collected by journald), but not POST.

Use config file instead of long command line arguments

An argument like --config=config.conf would be great, the commands become really long with all the settings.

formatter param to /translate

e.g. to specify rehtml vs rehtml-noent

Local variable possibly mixed up with class' data attribute

Local variable "serverlist" is possibly mixed up with class' data attribute on line 184 in gateway.py.

Does flushing some times get "out of sync"?

failing sanity-tests run about one hour apart give

exit code 1: .cat-fra: expected 'pour', got 'chaud' (for input: per)
exit code 1: ...................cat-fra: expected 'pour', got 'Éternels' (for input: per)
exit code 1: ............................................................cat-fra: expected 'pour', got '*carcater' (for input: per)
exit code 1:
exit code 1: ........................................................cat-fra: expected 'pour', got 'boulangerie' (for input: per)
exit code 1:
exit code 1: ..........................................cat-fra: expected 'pour', got 'relâcher' (for input: per)
exit code 1: .......................cat-fra: expected 'pour', got '*Cotilló' (for input: per)
exit code 1: ............cat-fra: expected 'pour', got 'rainure' (for input: per)
exit code 1: .................................................cat-fra: expected 'pour', got 'approuvez' (for input: per)

as if the output of sending "per" into apy's cat-fra was from the input of an earlier request.

(That seems quite bad)

Document translation: detect mime type within python?

http://stackoverflow.com/a/19682945/69663 gives php code to do it for docx; it'd be nice not having to shell out to mimetype/xdg-mime/file (which have had security issues). We would only need to check for the 3 or 4 types we accept, and disregard the rest.

a little cache for web page translation?

Some normal pages take up to 30s to translate with the giellatekno pairs.

Maybe we should try having a little
{ pair : { url : translationOutput } }
? Just for the slashdot effect where one page gets requested again and again.

Seeing as caching is Hard, we could keep it simple and just empty it when sys.getsizeof > max_url_cache. But sys.getsizeof doesn't recursively follow the size of the object; also, news sites etc. change, so I added a stupid simple "empty if older than two hours" rule for the whole cache object.

TODO:

have a time stamp per url instead
maybe don't save to cache if we already have N pages cached for that pair?
fix it for py32 https://travis-ci.org/goavki/apertium-apy/jobs/197297317

apertium / apertium-apy Goto Github PK

apertium-apy's Introduction

Apertium APy

Requirements

Installation

Usage

Contributing

apertium-apy's People

Contributors

Stargazers

Watchers

Forkers

apertium-apy's Issues

Recommend Projects

Recommend Topics

Recommend Org