Code Monkey home page Code Monkey logo

Comments (27)

TinoDidriksen avatar TinoDidriksen commented on June 23, 2024 2

Oh right, it also builds branches and PRs. Yeah, that would be worth it.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

Any thoughts on Travis vs Circle here being our default CI provider?

Travis has better support for testing matrices (Circle can do it with a bit extra config boilerplate) but it doesn't have free artifact storage. We'd have to rely on GitHub releases/tags which doesn't seem like a real problem. Circle also has a number of features I haven't found in Travis, e.g. workspaces and workflows, as well as more out-of-the-box support for Docker which means the install dependencies step for language/pair packages could just not exist (just a reference to the Apertium dev docker tag). That means faster builds and smaller configs.

cc @TinoDidriksen @xavivars @unhammer

from apertium-init.

unhammer avatar unhammer commented on June 23, 2024

What is it we want from CI for apertium-init, apart from running tests (and getting pretty badges)? I've never used circle, but have struggled a bit with travis configs – OTOH, it's typically fairly rare you have to change configs if you just want tests.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

@unhammer we have CI for apertium-init already. I think I was unclear: I'm talking about a default CI config for language pairs or language modules.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

Travis has a better story around concurrent build job limit for FOSS projects though.

Travis: 5 (https://twitter.com/travisci/status/651856122559774722)
Circle: 1 (https://circleci.com/pricing/)
Circle: 4 (https://circleci.com/pricing/#faq-section-linux - I read it wrong. 1 is for private.)

That's for the entire org. Travis does usually see much more build contention/queuing in my experience but if our builds are highly parallelized, i.e. multijobs, the single concurrent build limit imposed by Circle will be really troubling so I'm back to leaning towards Travis. Back to Circle I go.

from apertium-init.

TinoDidriksen avatar TinoDidriksen commented on June 23, 2024

Either will do, but neither will matter when none of the languages or pairs have usable regression test suites.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

none of the languages or pairs have usable regression test suites.

🤷‍♂️ Can't fix that one; at least it's a step in the right direction. IIRC, you found the build step useful when looking at PRs for ltoolbox. This would at least catch malformed XML and simple things like that in PRs.

from apertium-init.

flammie avatar flammie commented on June 23, 2024

as travis-ci user I've few things noted:

  • every few months it travis has some internal problems with connectivity so apertium travis-ci scripts that use tino's packages will fail
  • their python setup is totally unstable so it'll also start failing at random points but this is not huge issue for apertium languages

I don't know if Circle has similar issues.

For travis if you needed artifacts, you can easily do scp in post_success, I do that for continuous deployment.

For xmllintigng PRs hound-ci should eventually be sufficient.

In summary though I don't have any preference for either solution, just few things to not.e

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

every few months it travis has some internal problems with connectivity so apertium travis-ci scripts that use tino's packages will fail

Alas, I don't think either solution has 100% uptime and I think it's futile to compare the number of 9s. For a Circle setup though, there would not be any installation step from Tino's packages. The CI config would directly pull down the Docker image (https://github.com/apertium/docker) with apertium-all-dev and related friends already installed and run the tests on it. The only failure point is Docker hub download.

For travis if you needed artifacts, you can easily do scp in post_success, I do that for continuous deployment.

There isn't exactly an equivalent for Circle but you can do the same thing, just with different words since it provides a more generic framework.

I do it in e.g. https://github.com/sushain97/web2fs-notepad/blob/b4b48eb335370b70e3e85223c83202beeccb49c3/.circleci/config.yml#L69-L77 and https://github.com/apertium/apertium-html-tools/blob/b258f26c116cecbc03479ea0d085841349b4a923/.circleci/config.yml#L33-L34.

Circle offers free artifact storage which is served with the correct content type. Travis requires you to set up your own AWS S3 bucket and serve stuff yourself as well.

For xmllintigng PRs hound-ci should eventually be sufficient.

I like Hound except that sometimes it just comments way too much and makes actually reviewing the code painful.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

FWIW, Travis had some very serious stability issues in the last couple of weeks. I think it's correlated to them being purchased by some random IB and a large part of their eng team quitting (see https://news.ycombinator.com/item?id=19218036).

cc @IlnarSelimcan

from apertium-init.

IlnarSelimcan avatar IlnarSelimcan commented on June 23, 2024

Thanks for sharing this thread with me @sushain97 ! Indeed, we should collaborate on this and add at least a bare minimum of automated tests by default to any (linguistic) package. I've been rambling about this automated testing stuff for quite a while now. Time to gather up the notes (and scripts) and integrate it into something standard and useful I guess. I'll try to put some reference text (say, UDHR and Bible translations, to begin with) under [1] or somewhere close and then make a PR with tests in [2] (but in a way which should work for all languages, of course. There is some ad-hoc stuff in there. Also, test.py should be adjusted slightly and probably be made a python package).

I believe that

  1. building
  2. checking that coverage hasn't decreased
  3. checking that pairs like _ book<n><pl> books, > onlygenerate<hargle> onlygenerate, < analyseonly<hargle> analyseonly which are in the tests folder still work

is absolute mimimum which must run after each commit to a monolingual repo.

Travis and Circle

What bothers me about Circle

  • No es una software libre. Not even partially.

What bothers me about Travis

  • Speed/stability concerns apparently not only you and Flammie have noticed.
  • Libre, but probably not self-hostable in practice anyway.

What bothers me about both

  • Not self-hostable.
  • Can I generate custom badges with them? I want it store something like: WikipediaCoverage: 68%, NumberofStems: 90909 after each build. Not just to brag of with that and for analytics, but so that the next commit doesn't have to do some stupid git mangling and calculate the same thing twice for comparison, like I do in [2].
  • Also I want 2 build statuses for each monolingual repo: one saying whether make check for apertium-foo was successful, another (if the first one passed) saying whether make check in all bilingual repos which use apertium-foo has passed. Similarly, a commit to apertium/lttoolbox should trigger build & test of all lingusitic data repos. That's probably a longer term goal.
  • Installing core packages (or spinning up some Docker image with core packages installed) after each commit to a repo with linguistic data seems to be unbelievably stupid to me. Kids are skipping classes to protest against politicians' inability to address global warming, and with 500 (and likely to grow to thousands) Apertium repos each downloading a corpus/frequency list and installing core tools after every commit... That's where global warming comes from, right there! I don't want to contribute to it :D

From engineering perspective, I believe that a proper way to test linguistic data repos is to have a machine which has core tools installed and which, upon receiving a hook/message that a commit has been made to apertium-foo, updates and builds it. Besides, this machine should store all the corpora etc used for testing. It could store badges, which are some dumb svg files. You mention that CircleCI supports artifact storing, that's probably relevant here.

I tried that at [3]. Works OK, but of course just with a webhook a ton of stuff like email or IRC messages working "out-of-the-box" on Travis or Circle will require writing code for, which looks stupid. There is also good ol' Jenkins, which, from the description of it, can do what the new age guys like Travis and Circle can and then some. I haven't used it yet though.

You're right thinking that the biggest downside of this method is that it requires running a server. I had shut down my own server running fitnesse.selimcan.org few weeks ago, with the intention of moving it from DigitalOcean (since some of its IPs stopped working in Russia), but haven't gotten to it yet. Running a server is indeed a hassle which should probably be avoided, given that free CI is available.

[1] https://github.com/taruen/apertiumpp/tree/master/data4apertium/corpora
[2] apertium/apertium-tat@6dbcb19#diff-c949f93d03f44a4217d7a138f9e2e54a
[3] https://gitlab.com/selimcan/apertium-fitnesse

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

I think I agree with most of what you've said! Couple things I'm not sure about regarding the mechanics:

Libre, but probably not self-hostable in practice anyway.

Travis is open-source? Not as far as I can tell?

Can I generate custom badges with them?

This is just a matter of generating some SVGs. A simple Python script could generate SVGs and insert them into the artifacts. If there's some URL munging required, a simple Python server could handle redirecting URLs or even generating the SVGs in real time. https://github.com/apertium/apertium-stats-service could even handle it.

Also I want 2 build statuses for each monolingual repo

Should be possible just with a different script for the monolingual repos?

Similarly, a commit to apertium/lttoolbox should trigger build & test of all lingusitic data repos. That's probably a longer term goal.

This is almost certainly possible through Travis/Circle APIs but yeah, it's definitely a apertium/lttoolbox specific script.

Installing core packages (or spinning up some Docker image with core packages installed) after each commit to a repo with linguistic data seems to be unbelievably stupid to me.

Spinning up a Docker image with core packages installed is equivalent to booting a VM. Can't really get any less resource intensive than that for a CI service.

Apertium repos each downloading a corpus/frequency list

This part can be solved with a caching layer. Both Travis and Circle support that.

There is also good ol' Jenkins, which, from the description of it, can do what the new age guys like Travis and Circle can and then some. I haven't used it yet though.

Having used Jenkins, I would not be interested in supporting it :) IMO, it's better than a custom solution though. It would scale somewhat better because Travis/Circle have limits on free concurrent builds but I'm not sure we hit them anyway.

I think that unless we have a workload that requires a custom solution, we shouldn't use one for a couple reasons:

  1. requires us to have machines
  2. requires a non-trivial amount of custom code & maintenance (e.g. getting just normal hook => build working is nice but then what about when you want to be able to cancel jobs, you want to see job progress, you want a nice UI, etc, etc.)
  3. will not scale (i.e. multiple machines) without even more non-trivial code
  4. we'll never hit feature/integration parity with something like Circle/Travis (or even Jenkins)

I don't think there's anything in our workload that can't be solved by starting with a Docker image that already has core tools installed and then caching any downloads that are required. Of course, there's overhead in having things virtualized but practically speaking, the performance overhead is negligible and its carbon output overhead in perpetuity is probably far less than the cost of a single steak dinner :) It's basically just the cost of pulling down some bytes from cache.

from apertium-init.

IlnarSelimcan avatar IlnarSelimcan commented on June 23, 2024

Thanks for clearing this up @sushain97 .

Travis is open-source? Not as far as I can tell?

Well, should be, they say it is: https://blog.travis-ci.com/2019-01-23-travis-ci-joins-idera-inc

Either way, I was inclined to avoid anything custom or Jenkins-based semi-custom, and the arguments you gave are quite convincing. That option is pretty much ruled out now, I think.

I'll definitely play around with circleci and try to make it do those 3 bare minimum things .travis.yaml in apertium-tat is currently (half-assedly) doing.

I think I've seen you to have both .cricleyaml and .travisyaml in some of the repos. That might be a way to go, at least in the beginning, until we see that one is clearly better for our needs than the other. Or it can remain an option to apertium-init as --ci=[circle/travis] or something. No reason to decide upfront and not to test on both sides of the ocean for a while, if it isn't that environmentally unfriendly, as you write nvm: apparently travis-ci.org and travis-ci.com both run on Amazon US servers :)

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

I think I've seen you to have both .cricleyaml and .travisyaml in some of the repos.

Hmm, we do? That seems odd... I think I transitioned from one to another at some point in some repos though.

Or it can remain an option to apertium-init as --ci=[circle/travis] or something.

This would be cool.

apparently travis-ci.org and travis-ci.com both run on Amazon US servers

I think you meant circleci vs travis here since for travis, they're combining both the .org and .com (the former was reserved for open source and the latter paying customers until recently).

from apertium-init.

mr-martian avatar mr-martian commented on June 23, 2024

We now also have github actions to consider as well, which does have a straightforward way to generate custom badges (see https://docs.github.com/en/actions/managing-workflow-runs/adding-a-workflow-status-badge).

from apertium-init.

flammie avatar flammie commented on June 23, 2024

the github action's badges are neat but nowadays also shields.io provides pretty arbitrary badges for free.

from apertium-init.

mr-martian avatar mr-martian commented on June 23, 2024

I was looking at those, and my guess would be that maybe the github action could calculate the appropriate numbers and use shields.io to generate the appropriate image and then store it for use in the readme

from apertium-init.

flammie avatar flammie commented on June 23, 2024

I was looking at those, and my guess would be that maybe the github action could calculate the appropriate numbers and use shields.io to generate the appropriate image and then store it for use in the readme

yeah that's on my todo list, the way I see it really is is that you have to generate / sed the markdown snippet in readme in the github action which means the github action should probably be able to commit and push and without looping back, I haven't found simple enough example (in actions marketplace or whatever) so I haven't gone down that route yet.

from apertium-init.

mr-martian avatar mr-martian commented on June 23, 2024

Upon closer inspection, the link I posted is for a rather more limited feature than I thought.

https://github.com/marketplace/actions/bring-your-own-badge is closer, but commits the updated results to a side branch - not sure what to think of that solution

from apertium-init.

TinoDidriksen avatar TinoDidriksen commented on June 23, 2024

If https://github.com/apertium/apertium-stats-service worked, it could rather easily provide various badges. APy could also. Or nightly builder. Or Github Action could publish numbers to apertium.org and generate badges that way.

Lots of methods to get badges and store them in ways that don't involve committing to the repos. We just need to implement one of them. Stats-service was it, when it worked...

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

Stats-service was it, when it worked...

What broke? I can potentially try and go fix it.

from apertium-init.

TinoDidriksen avatar TinoDidriksen commented on June 23, 2024

Stats-service was it, when it worked...

What broke? I can potentially try and go fix it.

Don't know. It builds and runs, but does nothing. And only after adjusting the Rust version a few months forward, because as-is it wouldn't even build.

But I've just 2 hours ago concluded on IRC that stats-service is superfluous - these kinds of stats can be done better by the build system, so I'll do that. That will also let us generate all the badges we want and host them ourselves.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

Don't know. It builds and runs, but does nothing. And only after adjusting the Rust version a few months forward, because as-is it wouldn't even build.

Hm, okay. I'll go poke at it.

stats-service is superfluous - these kinds of stats can be done better by the build system, so I'll do that.

stats-service stores historical stats as well. Does the build system persist stats at build time somewhere?

from apertium-init.

TinoDidriksen avatar TinoDidriksen commented on June 23, 2024

stats-service stores historical stats as well. Does the build system persist stats at build time somewhere?

I was going to store them as timestamp + commit hash for that purpose.

from apertium-init.

xavivars avatar xavivars commented on June 23, 2024

I was going to give my two cents here, but saw the conversation is already going. My first point was going to be this

Lots of methods to get badges and store them in ways that don't involve committing to the repos. We just need to implement one of them. Stats-service was it, when it worked...

I can't agree more. Repo metadata shouldn't require commits with data on repos.

stats-service is superfluous - these kinds of stats can be done better by the build system, so I'll do that.

stats-service stores historical stats as well. Does the build system persist stats at build time somewhere?

I personally like a lot the idea of having all this information available via a REST API.

  • If that API is a standalone service, or it's built into something else (Apertium's official Apy?) I don't care too much.
  • If that service is the source of truth of the data, meaning it's calculating it, or is just exposing the data that another system (build system), I also don't have a strong preference.

But I want to reinforce the idea of having data available via APIs is desirable

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

FWIW, I plan to look into why stats-service isn't behaving properly this weekend. I imagine it'll be a relatively trivial fix.

from apertium-init.

sushain97 avatar sushain97 commented on June 23, 2024

@TinoDidriksen

I just went ahead and bumped the Rust version all the way to today's nightly: apertium/apertium-stats-service@c31f54c. The tests are more or less passing aside from some bitrot due to them depending on actual language modules rather than fixtures.

I also verified that the docker instructions @ https://github.com/apertium/apertium-stats-service#running still work. They required one tweak, binding on 0.0.0.0 rather than localhost (apertium/apertium-stats-service@7afc2d7). Perhaps that's the issue?

Logs:

$ docker build -t apertium-stats-service .
[+] Building 0.7s (15/15) FINISHED
 => [internal] load build definition from Dockerfile                                         0.0s
 => => transferring dockerfile: 37B                                                          0.0s
 => [internal] load .dockerignore                                                            0.0s
 => => transferring context: 34B                                                             0.0s
 => [internal] load metadata for docker.io/apertium/base:latest                              0.6s
 => [ 1/10] FROM docker.io/apertium/base@sha256:974e722bc0da399944f79f197bc77548220d5b1ab60  0.0s
 => [internal] load build context                                                            0.0s
 => => transferring context: 15.15kB                                                         0.0s
 => CACHED [ 2/10] RUN apt-get -qq update &&     apt-get -qq install --no-install-recommend  0.0s
 => CACHED [ 3/10] RUN curl -s https://sh.rustup.rs | sh -s -- -y --default-toolchain night  0.0s
 => CACHED [ 4/10] WORKDIR /src                                                              0.0s
 => CACHED [ 5/10] COPY Cargo.toml Cargo.lock ./                                             0.0s
 => CACHED [ 6/10] RUN mkdir src && echo 'fn main() {}' > src/main.rs && cargo build --rele  0.0s
 => CACHED [ 7/10] COPY . .                                                                  0.0s
 => CACHED [ 8/10] RUN cargo build --release                                                 0.0s
 => CACHED [ 9/10] RUN cargo install diesel_cli --version 1.2.0 --no-default-features --fea  0.0s
 => CACHED [10/10] RUN diesel database setup                                                 0.0s
 => exporting to image                                                                       0.0s
 => => exporting layers                                                                      0.0s
 => => writing image sha256:a9209f9c9d67a78e2b095da95470cfca47112c46e5e562f73fedeae61ebdf00  0.0s
 => => naming to docker.io/library/apertium-stats-service                                    0.0s
$ docker run -t -p 8000:8000 apertium-stats-service
<snip a ton of compile output>
   Compiling apertium-stats-service v0.1.0 (/src)
    Finished dev [unoptimized + debuginfo] target(s) in 3m 25s
     Running `target/debug/apertium-stats-service`
Mar 05 08:12:16.118 DEBG Fetching repos, after: None
Mar 05 08:12:20.351 DEBG Fetched 85 packages
Mar 05 08:12:20.352 DEBG Fetching repos, after: Y3Vyc29yOnYyOpHOB2kB3Q==
Mar 05 08:12:24.176 DEBG Fetched 100 packages
Mar 05 08:12:24.177 DEBG Fetching repos, after: Y3Vyc29yOnYyOpHOB2kHrA==
Mar 05 08:12:28.053 DEBG Fetched 100 packages
Mar 05 08:12:28.054 DEBG Fetching repos, after: Y3Vyc29yOnYyOpHOB2kK5Q==
Mar 05 08:12:31.360 DEBG Fetched 100 packages
Mar 05 08:12:31.360 DEBG Fetching repos, after: Y3Vyc29yOnYyOpHOB2kTRw==
Mar 05 08:12:35.730 DEBG Fetched 61 packages
Mar 05 08:12:35.731 DEBG Fetching repos, after: Y3Vyc29yOnYyOpHOCkIdTg==
Mar 05 08:12:39.074 DEBG Fetched 15 packages
Mar 05 08:12:39.074 INFO Completed package list update, next_update_min: PT4.299189786S, cost_remaining: 4994, total_cost: 6, length: 461
Mar 05 08:12:39.075 DEBG Next package update in 10s
🔧 Configured for development.
    => address: localhost
    => port: 8000
    => log: normal
    => workers: 8
    => secret key: generated
    => limits: forms = 32KiB
    => keep-alive: 5s
    => read timeout: 5s
    => write timeout: 5s
    => tls: disabled
🛰  Mounting /:
    => GET / (index)
    => GET /openapi.yaml (openapi_yaml)
    => GET /<name>?<params..> (get_stats)
    => GET /<name>/<kind>?<params..> (get_specific_stats)
    => POST /<name>?<params..> (calculate_stats)
    => POST /<name>/<kind>?<params..> (calculate_specific_stats)
    => GET /packages (get_all_packages)
    => GET /packages/<query> (get_specific_packages)
    => POST /packages (update_all_packages)
    => POST /packages/<query> (update_specific_packages)
🛰  Mounting /cors:
    => GET /cors/<status>
📡 Fairings:
    => 1 request: CORS
    => 1 response: CORS
🚀 Rocket has launched from http://0.0.0.0:8000
$ curl http://localhost:8000
USAGE

GET /apertium-<code1>(-<code2>)
retrieves statistics for the specified package

GET /apertium-<code1>(-<code2>)/<kind>
retrieves <kind> statistics for the specified package

POST /apertium-<code1>(-<code2>)
calculates statistics for the specified package

POST /apertium-<code1>(-<code2>)/<kind>
calculates <kind> statistics for the specified package

GET /packages/<?query>
lists packages with names including the optional query

POST /packages/<?query>
updates package cache and lists specified packages

See /openapi.yaml for full specification.⏎

If I remember correctly, the projectjj version had a docker-compose.yml with .env setup for prod and the db.sqlite mounted in.

If updating works and we still want it, I can switch it to build with GH actions and produce a proper static binary release. The docker version is kinda annoying since it takes a bit to compile in-line.

from apertium-init.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.