Code Monkey home page Code Monkey logo

apertium-stats-service's Introduction

Apertium Stats Service

Build Status Coverage Status

Stateful Rust web service that enables the efficient concurrent compilation and distribution of statistics regarding Apertium packages via a RESTful API.

Usage

See api.html for the Swagger UI representation of the OpenAPI 3.0 spec.

Running

Build with cargo build and run with cargo run.

Edit .env to set environment parameters including those that control Rocket configuration.

Use cargo build --release to create production binaries or use the provided Dockerfile:

docker build -t apertium-stats-service .
docker run -t -p 8000:8000 apertium-stats-service # or 80 for staging/prod

To persist data across restarts, use docker-compose.yml instead:

docker-compose up --build

Development

Install the Rust toolchain via rustup.

Setup a SQLite database with diesel database setup.

Run cargo fmt to format code, cargo clippy to check for lint and cargo test to run tests.

apertium-stats-service's People

Contributors

mr-martian avatar sushain97 avatar tinodidriksen avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

apertium-stats-service's Issues

Pass through errors to client

The format should be an errors key that contains a mapping from file name to error type and description.

For any type of query, store in memory a mapping from package to a mapping from filename to error. If an error occurs while handling a particular file, set the value. If it goes through cleanly, unset the value.

For an async query that triggers an execution of stats, return nothing. For an async query that has some results, return the current mapping of relevant ones minus the ones in progress (since those errors could be stale).

For a sync query, the errors can just be the ones from that execution of stats if one is required and otherwise the current mapping of relevant ones.

Investigate DB is locked errors

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: DatabaseError(__Unknown, "database is locked")', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Lexicon sizes underestimated

fran@ipek:~/source/apertium/incubator/apertium-quc/dev$ ./countstems.sh 
3594

And:

03:34 <begiak> INFO:root:Login as StemCounterBot succeeded
03:34 <begiak> INFO:root:Acquired file counts {'vanilla stems': ('1,215', ('https://raw.githubusercontent.com/apertium/apertium-quc/f22b3f13520897fe46e612f8b6d9a8808feffd22/apertium-quc.quc.lexc', 'francis.tyers', 'f22b3f'), 'https://raw.githubusercontent.com/apertium/apertium-quc/master/apertium-quc.quc.lexc'), 'stems': ('1,215',
03:34 <begiak> ('https://raw.githubusercontent.com/apertium/apertium-quc/f22b3f13520897fe46e612f8b6d9a8808feffd22/apertium-quc.quc.lexc', 'francis.tyers', 'f22b3f'), 'https://raw.githubusercontent.com/apertium/apertium-quc/master/apertium-quc.quc.lexc')}
03:34 <begiak> INFO:root:Update of page Apertium-quc/stats succeeded (http://wiki.apertium.org/wiki/Apertium-quc/stats)

Include git SHA in file info

Can be acquired with https://help.github.com/articles/support-for-subversion-clients/#finding-the-git-commit-sha-for-a-subversion-commit.

This would be another property:

sha:
    type: string
    example: 9f09e5e37aadb005fbd420e79803e506a8202f73

added to

properties:
last_author:
type: string
example: jim.o.regan
size:
type: integer
example: 13
last_changed:
type: string
format: date-time
path:
type: string
example: apertium-pl-dsb.pl-dsb.dix
revision:
type: integer
example: 13

Docker image builds and runs dev

Continuing from apertium/apertium-init#51 because issues is not a discussion board. Also please just come on IRC again...

It seems that the step RUN cargo build --release is unused. When the image is run, it will build 353 packages and run the dev version, not the release version. This is not ok. The image should be ready to go as-is - we can't wait several minutes for it to start.

Adding RUN cargo build at the end works around this, but it's still running dev - shouldn't it run release?

This is all running on Torro now. You have access to the Torro docs to see where and how it's run.

But this all ran before as well. The service started, requests went to it, you could browse the API live, but it returned nothing useful.

Panics on flood due to DB lock poisoning

After the initial error due to a DB connection timeout, the DB lock is poisoned and everything else just fails. Ideally, we would gracefully error for this particular task with logging (#30 would pass-through).

GET /apertium-ron/monodix:
    => Matched: GET /<name>/<kind>
Nov 11 01:54:41.472 INFO Spawning 1 task(s): [Task { created: 2018-11-11T01:54:41.472393209, file: File { path: "apertium-ron.ron.dix", size: 3202000, revision: 72, last_author: "marc.riera.irigoyen", last_changed: 2018-11-03T22:55:10 }, kind: Monodix }]    => Outcome: Success
, recursive    => Response succeeded.
: false, package: apertium-ron
Nov 11 01:54:41.593 DEBG Completed executing task, kind: Monodix, path: apertium-mlt.mlt.dix, recursive: false, package: apertium-mlt
GET /apertium-rus/monodix:
    => Matched: GET /<name>/<kind>
Nov 11 01:54:44.207 DEBG Completed executing task, kind: Monodix, path: apertium-pol.pol.dix, recursive: false, package: apertium-pol
thread 'tokio-runtime-worker-1' panicked at 'database connection: Error(None)', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread 'tokio-runtime-worker-5' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread 'tokio-runtime-worker-7' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread 'tokio-runtime-worker-3' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
GET /apertium-afr/monodix:
    => Matched: GET /<name>/<kind>
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
GET /apertium-afr/monodix:
    => Matched: GET /<name>/<kind>
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
GET /apertium-afr/monodix:
    => Matched: GET /<name>/<kind>
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
GET /apertium-afr/monodix:
    => Matched: GET /<name>/<kind>
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5
GET /apertium-afr/monodix:
    => Matched: GET /<name>/<kind>
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "PoisonError { inner: .. }"', libcore/result.rs:945:5

Add configurable timeout support

There should be a default timeout of X seconds per stats job and this should be configurable up to a maximum. There should be a way to make this testable without being too flaky.

Support listing of packages

/packages/{query}

Should match an optional query glob against apertium packages and return a list of objects containing the following fields which are configurable via a fields query parameter.

  • name
  • description
  • url
  • topics

and anything else that seems particularly useful from https://developer.github.com/v3/repos/#list-your-repositories

It would also be nice to have information about the last commit (author, date, sha, svn revision number).

The full list should be cached in memory and only retrieved from GitHub at a rate that will never hit their free tier rate limiting dynamic limit. Updating should be done via a background thread and the endpoint should never block. It should also provide a timestamp that indicates when it was last updated (and how far the next update is/already running?).

Can force update via POST [sync only].

[Question] How do I use /calcCoverage

I was told to use /calcCoverage, I need to have apertium-stats-service installed. The wiki gives to use port 2737 while the having this installed as a separate container shows port 8000. I am trying to figure out if there is a way to install this inside of Apertium-APY or is the wiki in error and I need to run this as a separate service. Thus I have a service for /translate and a service for /calcCoverage which both need all language libraries downloaded.

add support for lexd

The lexd compiler is getting support for outputting stats, e.g.

$ lexd -x .deps/zab.LR.lexd > /dev/null
Lexicons: 54
Lexicon entries: 338
Patterns: 3
Pattern entries: 35

Stats-service should track these numbers (at least lexicon entries and pattern entries) for language modules that use lexd.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.