Code Monkey home page Code Monkey logo

Comments (9)

ionicsolutions avatar ionicsolutions commented on August 26, 2024 1

Thanks a lot for looking into this! My client mostly tests on Mac OS because many of their devs are using it, so we're not overly concerned with fixing it, as nobody will run actual workloads under Mac OS.

For now, I'll just mark the tests as experimental and continue to observe whether there is a pattern in the failures.

from setup-spark.

vemonet avatar vemonet commented on August 26, 2024

Hi @ionicsolutions , thanks for the clear report. Indeed the action should fail!

The reason why it does not fail: when I built the action I was a bit lazy, and went for the fastest and most reliable solutions I knew: wget ! (it's by default in all linux runners afaik, so why not using it?)

The faulty code is here: https://github.com/vemonet/setup-spark/blob/main/src/setup-spark.ts#L25

The problem is that if the version/file does not exist it does not trigger an error in the JS

2 solutions:

  • Replace wget to use fetch() or axios()
  • Add a check to make sure the file has been downloaded by wget, fail if not

If anyone has an suggestion? I am not sure if using wget from the JS actions is recommended, but from my point of view it just work for downloading files, and it's the fastest out there (not sure if fetch will work out of the box with everything and be as fast). And it seems like speed this is what people like about this action: #8 (not sure if fetch will be as fast)

  • I don't really care if it does not work on Windows runner (if someone care please open an issue, but only a fool would want to run data science processes directly on Windows!)

I don't have much time to look into it right now, but it should not be a really hard change to make if someone want to contribute! Otherwise I'll look into it some day when I will have time for it (or if more people are coming to complain!)

from setup-spark.

ionicsolutions avatar ionicsolutions commented on August 26, 2024

I believe that https://stackoverflow.com/a/43077917/7469434 would be the solution here, I can give it a try.

from setup-spark.

vemonet avatar vemonet commented on August 26, 2024

Hi @ionicsolutions

Thanks for the suggestion! I went for something similar and added an extra file check to make sure the spark binaries are properly downloaded

I also used the opportunity to upgrade some dependencies, and node12 to node16

I updated the v1 release, let me know if it does not work as expected!

from setup-spark.

ionicsolutions avatar ionicsolutions commented on August 26, 2024

Hi @vemonet, thanks for taking care of this.

I updated the v1 release, let me know if it does not work as expected!

What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix. Happy to report back on any issues that might surface!

from setup-spark.

vemonet avatar vemonet commented on August 26, 2024

What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix.

Relevant remark @ionicsolutions , I agree with you, and that's usually what I do for every project I work on, but for GitHub actions I am a bit confused because GitHub does not seems to recommend using incremental semantic-like versioning

Here I didn't make breaking change to the API, how the action is used, or how it works (it is working the same as before, but better, less chances of failing silently, which was also noted as a problem in other issues). So usually I would have created a release like 1.0.1 or 1.1.0

But:

  • GitHub action does not seems to support semantic versioning (or at least I never seen it in use), so I cannot ask for something like version v1.* or v1+ ("Get me the latest minor patch of version 1")
  • if you look at most existing GitHub actions they use a really strict and simple versioning system: v1, v2, v3... (cf. setup-python, the clone action, etc)

In my opinion it seems to be a design choice, GitHub Actions are not complex softwares, they are just wrappers to run complex softwares in a specific environment. So it is better to go with a simple versioning system (occam's razor principle, only make a system as complex as you need it to be!)

If I am fixing some vulnerability-introducing bugs in the action, then it's better if most users of the action can get the update without additional work from there side. In this case my best option is be to update the v1 release, so everyone using the action will use a sane version the next time it runs in their workflow.

Otherwise the setup-spark users would need to be notified about the fact that I released an update that is fixing some vulnerabilities, but I don't think there is such mechanisms for GitHub Actions, and it is impossible for humans to manually check all the GitHub Actions they use regularly to see if there is an update they need to make to some of them. Also to be honest as a developer I would rather not need to care about the versioning of some github actions I use for testing, if some companies want to use them in critical workloads then they should do their due diligence for a secure supply chain: fork the action and review every new change before running in production

So from my point of view it seems safer, more developer friendly, less stressful for everyone (the action maintainer and the action users) to just update the existing version v1 as long as GitHub Action does not support proper dynamic versioning that enables minor patches to be automatically used

But that's just a personal observation that I made from using GitHub Actions, I am not sure if there is any docs, or has been any discussion about those concerns. I would be interested if you have any interesting pointers or better solutions for this!

from setup-spark.

ionicsolutions avatar ionicsolutions commented on August 26, 2024

Hi @vemonet, I understand your reasoning but personally find it dangerous to silently make changes to scripts or tools others depend on.

For example, it seems that the update you made breaks the Action on MacOS, not sure what exactly is going wrong: https://github.com/anovos/anovos/runs/6356850929?check_suite_focus=true If I didn't know that there was a change made to the Action, it would be difficult for me to debug. (It might also be that Apache's servers are flaky, will investigate and file an issue.)

GitHub itself uses semantic versioning for Actions, e.g., https://github.com/actions/cache and with Dependabot it's also not too difficult to keep track of Actions updates. But I fully get your points!

from setup-spark.

vemonet avatar vemonet commented on August 26, 2024

Thanks for your pointers! Indeed actions/cache uses semantic versioned releases, but you still can uses: actions/cache@v3, which is exactly what I was looking for. I'll check their setup, and will uses proper versioning next time I need to do an upgrade :)

For the MacOS bug, I did not tried macos runner, I will add a test for this

I just tried a fresh MacOS installation (on a MacBook Pro) and it seems like wget is not installed on MacOS by default (maybe it was silently failing before? And the added try/catch allow to now detect this problem easily)

So maybe switching for curl -fsSL instead of wget will fix it

from setup-spark.

vemonet avatar vemonet commented on August 26, 2024

@ionicsolutions Testing with MacOS and curl: https://github.com/vemonet/setup-spark/actions/runs/2306295979

MacOS seems to work in general, just one random fail for spark 3.0.3 (even if previous and later versions all work fine)

Then testing MacOS with wget instead of curl: https://github.com/vemonet/setup-spark/actions/runs/2306331177

The action consistently takes 1 min to complete usually, but 2 jobs have been stuck on Run setup-spark for 15min now: 3.0.3 (the same that failed in the curl test) and also 3.1.2 now

Note that it is (randomly) failing on those commands: wget && tar && rm && ln -s (which are all fully working for every other versions of Spark)

So it's probably related to global instability of the MacOS runner (that's why I did not tried to test for MacOS in the first place... I don't see it as an OS built for doing computing, it's more of an nice interface supporting Unix specs, and it costs a lot)

from setup-spark.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.