Comments (9)
Thanks a lot for looking into this! My client mostly tests on Mac OS because many of their devs are using it, so we're not overly concerned with fixing it, as nobody will run actual workloads under Mac OS.
For now, I'll just mark the tests as experimental and continue to observe whether there is a pattern in the failures.
from setup-spark.
Hi @ionicsolutions , thanks for the clear report. Indeed the action should fail!
The reason why it does not fail: when I built the action I was a bit lazy, and went for the fastest and most reliable solutions I knew: wget
! (it's by default in all linux runners afaik, so why not using it?)
The faulty code is here: https://github.com/vemonet/setup-spark/blob/main/src/setup-spark.ts#L25
The problem is that if the version/file does not exist it does not trigger an error in the JS
2 solutions:
- Replace
wget
to usefetch()
oraxios()
- Add a check to make sure the file has been downloaded by
wget
, fail if not
If anyone has an suggestion? I am not sure if using wget
from the JS actions is recommended, but from my point of view it just work for downloading files, and it's the fastest out there (not sure if fetch will work out of the box with everything and be as fast). And it seems like speed this is what people like about this action: #8 (not sure if fetch
will be as fast)
- I don't really care if it does not work on Windows runner (if someone care please open an issue, but only a fool would want to run data science processes directly on Windows!)
I don't have much time to look into it right now, but it should not be a really hard change to make if someone want to contribute! Otherwise I'll look into it some day when I will have time for it (or if more people are coming to complain!)
from setup-spark.
I believe that https://stackoverflow.com/a/43077917/7469434 would be the solution here, I can give it a try.
from setup-spark.
Thanks for the suggestion! I went for something similar and added an extra file check to make sure the spark binaries are properly downloaded
I also used the opportunity to upgrade some dependencies, and node12 to node16
I updated the v1 release, let me know if it does not work as expected!
from setup-spark.
Hi @vemonet, thanks for taking care of this.
I updated the v1 release, let me know if it does not work as expected!
What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix. Happy to report back on any issues that might surface!
from setup-spark.
What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix.
Relevant remark @ionicsolutions , I agree with you, and that's usually what I do for every project I work on, but for GitHub actions I am a bit confused because GitHub does not seems to recommend using incremental semantic-like versioning
Here I didn't make breaking change to the API, how the action is used, or how it works (it is working the same as before, but better, less chances of failing silently, which was also noted as a problem in other issues). So usually I would have created a release like 1.0.1 or 1.1.0
But:
- GitHub action does not seems to support semantic versioning (or at least I never seen it in use), so I cannot ask for something like version
v1.*
orv1+
("Get me the latest minor patch of version 1") - if you look at most existing GitHub actions they use a really strict and simple versioning system: v1, v2, v3... (cf. setup-python, the clone action, etc)
In my opinion it seems to be a design choice, GitHub Actions are not complex softwares, they are just wrappers to run complex softwares in a specific environment. So it is better to go with a simple versioning system (occam's razor principle, only make a system as complex as you need it to be!)
If I am fixing some vulnerability-introducing bugs in the action, then it's better if most users of the action can get the update without additional work from there side. In this case my best option is be to update the v1
release, so everyone using the action will use a sane version the next time it runs in their workflow.
Otherwise the setup-spark
users would need to be notified about the fact that I released an update that is fixing some vulnerabilities, but I don't think there is such mechanisms for GitHub Actions, and it is impossible for humans to manually check all the GitHub Actions they use regularly to see if there is an update they need to make to some of them. Also to be honest as a developer I would rather not need to care about the versioning of some github actions I use for testing, if some companies want to use them in critical workloads then they should do their due diligence for a secure supply chain: fork the action and review every new change before running in production
So from my point of view it seems safer, more developer friendly, less stressful for everyone (the action maintainer and the action users) to just update the existing version v1 as long as GitHub Action does not support proper dynamic versioning that enables minor patches to be automatically used
But that's just a personal observation that I made from using GitHub Actions, I am not sure if there is any docs, or has been any discussion about those concerns. I would be interested if you have any interesting pointers or better solutions for this!
from setup-spark.
Hi @vemonet, I understand your reasoning but personally find it dangerous to silently make changes to scripts or tools others depend on.
For example, it seems that the update you made breaks the Action on MacOS, not sure what exactly is going wrong: https://github.com/anovos/anovos/runs/6356850929?check_suite_focus=true If I didn't know that there was a change made to the Action, it would be difficult for me to debug. (It might also be that Apache's servers are flaky, will investigate and file an issue.)
GitHub itself uses semantic versioning for Actions, e.g., https://github.com/actions/cache and with Dependabot it's also not too difficult to keep track of Actions updates. But I fully get your points!
from setup-spark.
Thanks for your pointers! Indeed actions/cache
uses semantic versioned releases, but you still can uses: actions/cache@v3
, which is exactly what I was looking for. I'll check their setup, and will uses proper versioning next time I need to do an upgrade :)
For the MacOS bug, I did not tried macos runner, I will add a test for this
I just tried a fresh MacOS installation (on a MacBook Pro) and it seems like wget
is not installed on MacOS by default (maybe it was silently failing before? And the added try/catch allow to now detect this problem easily)
So maybe switching for curl -fsSL
instead of wget
will fix it
from setup-spark.
@ionicsolutions Testing with MacOS and curl
: https://github.com/vemonet/setup-spark/actions/runs/2306295979
MacOS seems to work in general, just one random fail for spark 3.0.3
(even if previous and later versions all work fine)
Then testing MacOS with wget
instead of curl
: https://github.com/vemonet/setup-spark/actions/runs/2306331177
The action consistently takes 1 min to complete usually, but 2 jobs have been stuck on Run setup-spark
for 15min now: 3.0.3
(the same that failed in the curl
test) and also 3.1.2
now
Note that it is (randomly) failing on those commands: wget && tar && rm && ln -s
(which are all fully working for every other versions of Spark)
So it's probably related to global instability of the MacOS runner (that's why I did not tried to test for MacOS in the first place... I don't see it as an OS built for doing computing, it's more of an nice interface supporting Unix specs, and it costs a lot)
from setup-spark.
Related Issues (10)
- spark install error in github action HOT 6
- Error downloading the Spark binary HOT 3
- Action spend too much time sometimes to complete HOT 11
- SPARK_HOME set incorrectly for versions of spark below 3 HOT 1
- Symbolic link fails when "old" Github runner is used. HOT 3
- Run vemonet/setup-spark@v1 started failing from yesterday HOT 4
- The "set-output" command is deprecated and will be disabled soon. HOT 3
- Action Release Version not up to date HOT 3
- 🐞 Node.js 16 actions are deprecated. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from setup-spark.