Code Monkey home page Code Monkey logo

Comments (12)

mtelvers avatar mtelvers commented on June 8, 2024 2

How about removing the arm64 build, as we only deploy the x86_64 version? Both builds happen in parallel therefore, it wouldn't be any quicker, but both builds must succeed to proceed to the next stage of the pipeline so there would be one fewer dependency. Should save a bit of carbon too!

from infrastructure.

avsm avatar avsm commented on June 8, 2024

The failure here is due to a build error, not due to an infrastructure error. If anything, we need a bit more infrastructure to alert us when the opam.ocaml.org pushes fail (a Matrix channel would be ideal).

See deploy.ci.ocaml.org, and after clicking on the logs tab and looking through the table, I spotted https://deploy.ci.ocaml.org/job/2023-06-06/160459-ocluster-build-dcd1d2, which in turn shows that opam2web is failing:

 checking for OCaml findlib package unix... found
#37 3.289 checking for OCaml findlib package bigarray... found
#37 3.291 checking for OCaml findlib package re 1.9.0 or later... found 1.10.4
#37 3.297 checking for OCaml findlib package base64 3.1.0 or later... no
#37 3.298 checking for OCaml findlib package cmdliner... found
#37 3.300 checking for OCaml findlib package ocamlgraph... not found
#37 3.302 checking for OCaml findlib package cudf 0.7 or later... no
#37 3.303 checking for OCaml findlib package dose3.common 6.1 or later... no
#37 3.305 checking for OCaml findlib package dose3.algo 6.1 or later... no
#37 3.307 checking for OCaml findlib package opam-file-format 2.1.4 or later... no
#37 3.309 checking for OCaml findlib package spdx_licenses... not found
#37 3.311 checking for OCaml findlib package opam-0install-cudf 0.4 or later... no
#37 3.312 checking for OCaml findlib package jsonm... not found
#37 3.314 checking for OCaml findlib package uutf... found
#37 3.315 checking for OCaml findlib package sha... not found
#37 3.316 checking for OCaml findlib package swhid_core... not found
#37 3.318 checking for OCaml findlib package mccs 1.1+9 or later... no
#37 3.320 
#37 3.320 configure: error: Dependencies missing. Use --with-vendored-deps or --disable-checks
#37 ERROR: executor failed running [/bin/sh -c opam exec -- ./configure --without-mccs && opam exec -- make lib-ext && opam exec -- make]: exit code: 1

I've not root caused this further yet... /cc @mtelvers @tmcgilchrist

from infrastructure.

avsm avatar avsm commented on June 8, 2024

It looks like the deployer is building all branches of opam2web for some reason. That seems like it could be tightened down to just the live and staging branches.

Regarding this question by @hannesm:

Another question is whether you have monitoring of the service opam.ocaml.org (about the key things: online, replies to HTTP requests, serves an up-to-date archive), and if yes, is that online and available somewhere? (I suggest setting up a "status.opam.ocaml.org" with some information, and maybe post-mortens about the issues that happened in recent months.)

... the tracking issue is #31

from infrastructure.

tmcgilchrist avatar tmcgilchrist commented on June 8, 2024

It looks like the deployer is building all branches of opam2web for some reason. That seems like it could be tightened down to just the live and staging branches.

That is happening by design, to check any PRs are deployable before merging to live or staging. We could remove that behaviour but I would advise against it. There are many stale branches on opam2web could they be cleaned up to just what is required?

The failure here is due to a build error, not due to an infrastructure error. If anything, we need a bit more infrastructure to alert us when the opam.ocaml.org pushes fail (a Matrix channel would be ideal).

Tracking issue is ocurrent/ocurrent-deployer#111. Is there a Matrix channel / server available for posting messages to? Current plan was to post to the Slack channel for opam-maintainers. Usually the issue is the large size of the docker image created and the build timing out or getting rate limited by docker hub.

The longer term fix for the opam2web size issue is to move the documentation into the new ocaml.org website and have opam2web just build the index file.

from infrastructure.

avsm avatar avsm commented on June 8, 2024

Thanks @tmcgilchrist, the deployability checks do indeed make sense. I think the real blocker to debugging what's going on is the lack of historical build information, which I've posted up at ocurrent/ocurrent-deployer#190. Without that, there's not much point having the web interface for the deployer as it's always only showing the current (and long-running) build. ocurrent/ocurrent-deployer#190

I've set up a simple Matrix room on #ocaml-infra:recoil.org which we can use for notifications. Once it's working, we can alias to another homeserver (for redundancy) and then add it to the OCaml space.

from infrastructure.

hannesm avatar hannesm commented on June 8, 2024

First of all, thanks for fixing the update process (in case you did something, at least there was an update of the opam repository on opam.ocaml.org).

Second, I'll close this issue. I have the feeling that you're very convinced that the system and complexity is very necessary for any commit to the opam-repository, and shoveling huge docker images across the Internet for deployment is deeply necessary. Whereas my approach would be radically different: I'd try to find the minimal thing which needs to be done for an update (including building opam2web binaries and package them, with the grand goal to save resources (computation / network)). But since you're convinced of the technology and stack in use, I won't argue against it.

from infrastructure.

avsm avatar avsm commented on June 8, 2024

@hannesm, removing the Docker Hub from the equation to save resources is entirely in scope, especially if it saves resources and energy (which it will). It's a matter of smooth transition of the infrastructure and time, and Ocurrent can easily wrap any dataflow. I'd welcome a simpler future infrastructure than the existing one.

from infrastructure.

hannesm avatar hannesm commented on June 8, 2024

Again, it is 2 days behind. While scrolling through "https://deploy.ci.ocaml.org/?repo=ocaml-opam/opam2web&", I can find two "jobs" (please excuse if you have other terminology) -- one being "ocurrent/opam.ocaml.org: live", the other "ocurrent/opam.ocaml.org: staging".

Somehow, one gets "live", the other "live-staging" branch of opam2web -- which both point to similar commits, and diverge from the master branch (is this intended?).

Now, in their "log output", there's a lot of stuff, but I'm curious that both logs have this line:
Pushing "sha256:ccc1b6aa4f224fd9ee2dc4ce4140863e87d6c743e8e25c5f8b5b2e9612a2982c" to "ocurrentbuilder/staging:live-ocurrent-opam.ocaml.org-linux-x86_64" as user "ocurrentbuilder"
Pushing "sha256:24c3d50de483b79c3e24cd54c981059b4437b971b651bc3be51a5714b7984f90" to "ocurrentbuilder/staging:live-ocurrent-opam.ocaml.org-linux-x86_64" as user "ocurrentbuilder"

For me, as somehow who doesn't know anything about docker and docker hub, it looks like they're racing pushing to the same tag remotely. Is this correct? I haven't had any luck to figure out what these "jobs" are actually supposed to do (apart from the graphical output which lacks all the details).

May it be, that, given the current pace of development of opam2web, restrict these two "jobs" to a single one? I also have a hard time to understand where / what is getting deployed if both push the same tag, and the host in mind is only "opam.ocaml.org" -- is there a "live" and "staging" subdomain? Is it worth it?

Is it possible for you to hand out an executable POSIX shell script that condenses the steps taken when "there is a new commit to opam-repository"? I'd love to take a look what is involved to get a clearer mind about the carbon footprint involved. With "ocurrent" and some docker scripting, I'm sure you can extract that. If not, a (single!) Dockerfile could be helpful as well.

Thanks for reading.

from infrastructure.

tmcgilchrist avatar tmcgilchrist commented on June 8, 2024

@hannesm the build instructions are documented on https://github.com/ocaml-opam/opam2web#docker. What ocurrent is doing in this process is running that docker build with the lastest git version for opam repository and ocaml/platform-blog, and then deploying that.

If you want to run it locally use this command:

DOCKER_BUILDKIT=1 \
docker build -t opam2web  -f Dockerfile . --build-arg \
OPAM_GIT_SHA=42b392e634b2f2fc7e027070ccae412e55eba41b \
BLOG_GIT_SHA=356e7d2ea63d5945828b9c5421a007db125f1710

The build generates a large docker image with all the package documentation, which is what takes so much time to build and triggers the timeouts you are seeing. The plan is to move everything to ocaml.org documentation, and we can stop building that and just generate the opam index file which will be much faster. That work is being done under #26 cc @tmattio

In the meantime the docker layers present in that Dockerfile could be optimised to avoid rebuilding and using cached layers. If you have some time and want to help with that, it would be appreciated.

Finally I've restarted the build and will keep an eye on it today.

from infrastructure.

hannesm avatar hannesm commented on June 8, 2024

Thanks for your pointer. Unfortunately, there's no docker available on my operating system. I'm still confused by the Dockerfile you pointed to (so many FROM lines), and that it calls so much stuff (including the bin/opam-web.sh script which does yet another set of git clone and execute various other things).

So, good luck with that. From your message

with all the package documentation

do you mean the package index, as in https://opam.ocaml.org/packages/awa/, or is there other (API) documentation being built? Certainly I understand that the platform-blog and the opam documentation is put there.

Btw, do you have an idea why in the log output of both deployer jobs the following lines occur (as I mentioned above) - and do both live and staging race for the same tag (do these contain the same data?)?

Pushing "sha256:ccc1b6aa4f224fd9ee2dc4ce4140863e87d6c743e8e25c5f8b5b2e9612a2982c" to "ocurrentbuilder/staging:live-ocurrent-opam.ocaml.org-linux-x86_64" as user "ocurrentbuilder"
Pushing "sha256:24c3d50de483b79c3e24cd54c981059b4437b971b651bc3be51a5714b7984f90" to "ocurrentbuilder/staging:live-ocurrent-opam.ocaml.org-linux-x86_64" as user "ocurrentbuilder"

from infrastructure.

tmcgilchrist avatar tmcgilchrist commented on June 8, 2024

So, good luck with that.

Yeah, what can I say it isn't optimal and was only supposed to be in place for a short time while a better solution was being developed.

Do you mean the package index, as in https://opam.ocaml.org/packages/awa/, or is there other (API) documentation being built? Certainly I understand that the platform-blog and the opam documentation is put there.

Yes that is right, so it builds all of https://opam.ocaml.org/packages/* for all packages plus the platform-blog and opam documentation, as per your response. This will be resolved by #26 which shouldn't be far away.

Do both live and staging race for the same tag (do these contain the same data?)?

They will be using different tags so there is no race, but most of the data will be the same. This isn't worth fixing since this whole setup will be replaced soon.

Briefly, the deployment is:

  • live branch is deployed on opam.ocaml.org using the docker image ocurrent/opam.ocaml.org:live to two different machines for redundancy.
  • staging branch is deployed on staging.opam.ocaml.org using the docker image ocurrent/opam.ocaml.org:staging to two different machines for redundancy.

The extra docker pushes you're pointing to are a staging docker hub hosted locally on the machine for caching. Before pointing out the obvious waste in pushing images around, the services deployed using docker service .. require the images are on docker hub (Why? For entirely non-technical reasons from what I can determine). This docker service limitation should also be disappearing soon as part of fixing the IPv6 accessibility of OCaml infrastructure. More to come on that soon.

from infrastructure.

hannesm avatar hannesm commented on June 8, 2024

Thanks for your instructions. Still, I don't have any "docker" executable on my Unix operating system, so I'm out of luck trying to do anything in this regard. I still don't understand the setup and why it is so complex (and which bits are pushed around for what).

In any case, it seems like your solution "wait until ocaml.org hosts the package stuff" is what you're aiming for. I don't have anything to contribute there. For what it is worth, there's still a huge delay from "someone merged a PR" to "it shows up on opam.ocaml.org" (> 20 hours). But the accumulated technical debt in your deployment seems to be superseeded (soon, or at least in a planned future) by some other piece of technology (which by luck may result in quicker updates, though there may be ocaml.org package index and opam.ocaml.org/index.tgz being out of sync -- but maybe that is not relevant for those maintaining "ocaml.org").

from infrastructure.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.