Code Monkey home page Code Monkey logo

setup-spark's Introduction

Mostly playing with knowledge graphs (RDF, SPARQL, ontologies, SHACL, RML), life sciences data, and web technologies.

👨‍💻 Recent projects

💠 Shapes of you: an index for publicly available semantic resources (ontologies, vocabularies, shapes, queries, mappings) stored in Git repositories (GitHub, GitLab, Gitee). Visit index.semanticscience.org

♻️ FAIR Enough: a service to define and run evaluations of the FAIR principles (Findable, Accessible, Interoperable, Reusable) on online resources. Visit fair-enough.semanticscience.org

🔬 Nanopublications and the Knowledge Collaboratory: An ecosystem to publish and retrieve scientific claims using Translator standards. Visit api.collaboratory.semanticscience.org

🔭 The Data Science Research Infrastructure: An OKD Kubernetes cluster to run Data Science experiments at Maastricht University. Visit dsri.maastrichtuniversity.nl

🧭 into-the-graph: A lightweight web browser for SPARQL endpoints. Visit maastrichtu-ids.github.io/into-the-graph

🔮 Translator OpenPredict: a Translator OpenAPI to compute and serve predictions of biomedical concepts associations. Visit openpredict.semanticscience.org

🧞 Contributions to Open Source

Build README

setup-spark's People

Contributors

benjamincitrin avatar dependabot[bot] avatar joekendal avatar r-t-m avatar vemonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

setup-spark's Issues

Action does not fail if file cannot be found

Describe the bug
When you specify a combination of versions that does not exist, the Action does not fail:

image

The URL https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.3.tgz leads to a 404, the correct URL is https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz.

I made this mistake because https://spark.apache.org/downloads.html advertises 3.2.1 as "with Hadoop 3.3 and later".

In any case, the action should fail if Spark cannot be installed instead of silently progressing.

Which version of the action are you using?

v1

Environment
GitHub-hosted

Spark Versions
Please list all of the effected versions of Spark (3.1.2, etc.)

To Reproduce

      - uses: vemonet/setup-spark@v1
        with:
          spark-version: 3.2.1
          hadoop-version: 3.3

Run/Repo Url
https://github.com/anovos/anovos/runs/5326819596?check_suite_focus=true

Screenshots

image

Additional context

Action spend too much time sometimes to complete

Hello,

I don't know if it's a bug or it's a issue with the server from where is downloading the file, but I have seen that this task can spend from "seconds" to "minutes", for example:

image

Do you know how I can investigate whose fault it is?

Thank you so much

SPARK_HOME set incorrectly for versions of spark below 3

Describe the bug
I'm aware that the action isn't tested for versions below 3, but wondered if you might be able to advise on how I can adjust the configuration to work.

When calling spark though pytest (pyspark), we get an error stating that spark-submit is not where it is is expected to be:

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/repo/repo/..//spark/./bin/spark-submit': '/home/runner/work/repo/repo/..//spark/./bin/spark-submit

I expect that this is due to the lower spark version, but have been unable to find how the configuration of later versions differ. Any pointers here would be greatly appreaciated.

Which version of the action are you using?

v1

Environment
GitHub Actions

Spark Versions

      - uses: actions/setup-python@v2
        with:
          python-version: 3.6.8

      - uses: actions/setup-java@v1
        with:
          java-version: '8'

      - uses: vemonet/setup-spark@v1
        with:
          spark-version: '2.4.0'
          hadoop-version: '2.1.1'

Run/Repo Url
See run here

Error downloading the Spark binary

Describe the bug
Error downloading the Spark binary

Which version of the action are you using?

v1

Environment
GitHub-hosted

Spark Versions
3.1.1

To Reproduce

    - uses: vemonet/setup-spark@v1
      with:
        spark-version: '3.1.1'
        hadoop-version: '3.2'

Screenshots
image

Notes
It seems that the requested file is not right now at the specified location.

https://www.apache.org/dyn/closer.lua/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz?as_json returns

"path_info": "spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz",
"preferred": "https://ftp.cixug.es/apache/"

but I think it's wrong, If I go to this location I cann't find this file :(

image

If I go manually to https://www.apache.org/dyn/closer.lua/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz (no ?to_json query string), a message appears that says "The requested file or directory is not on the mirrors.
The object is in our archive : https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz"

May be a parameter in the GitHub Action to enter manually the url (if user wants to use it) could solve the problem now and for future changes too

Thank you so much

Symbolic link fails when "old" Github runner is used.

Describe the bug
When Github reuses a runner to execute the action, the spark installation fails due a symbolic link error. See here:
Screenshot 2022-09-08 at 06 44 04

Note that when a "new" runner is used, the installation works fine. See how I distinguish between "old" and "new" runners in the To Reproduce section below.

I forked this repository and solved my issue by by changing the flags in line 67 of dist/.js from -s to -sf. As per the man page, this removes existing links before creating a new one.

ln -sf "${installFolder}/spark-${sparkVersion}-bin-hadoop${hadoopVersion}${scalaBit}" ${installFolder}/spark;

If you're on board, I'd create a PR with this change.

Which version of the action are you using?
v1

Environment
Self-hosted, linux

Spark Versions
image

To Reproduce
Steps to reproduce the behavior:

  1. Create a workflow with a Python installation and include pip3 install pyspark pytest as a command
  2. One Python and Spark have installed, cancel the workflow and re-run all jobs to increase the chances of the same runner picking up the job. (This can also happen with normal "on push" triggers, but it's more difficult to reproduce.)
  3. If the pip3 command is using cached packages, we know we are using an "old" runner
  4. See error

Run/Repo Url
My fork: https://github.com/johnnylarner/setup-spark

Run vemonet/setup-spark@v1 started failing from yesterday

Describe the bug
A clear and concise description of what the bug is.
Run vemonet/setup-spark@v1 started failing since yesterday, not sure tried checking multiple versions but still the same. Would you mind helping to find some resolution here? The download of the jar file is taking too long, which is resulting in an
Error: The operation was cancelled
which is happening during the git-hub build.

Which version of the action are you using?

v1, main, other tag?
vemonet/setup-spark@v1

Environment
GitHub-hosted, Self-hosted?

GitHub-hosted

Spark Versions
Please list all of the effected versions of Spark (3.1.2, etc.)

      spark-version: '3.1.2'
      hadoop-version: '3.2' 

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Run/Repo Url
If applicable, and if your repo/run is public, please include a URL so it is easier for us to investigate.

Screenshots
If applicable, add screenshots to help explain your problem.

Screenshot 2023-05-17 at 10 49 33 AM

Additional context
Add any other context about the problem here.

spark install error in github action

Describe the bug

error when install spark in the github action env

Which version of the action are you using?

  • v1

Environment

  • github action env

If applicable, please specify if you're using a container

Spark Versions
3.0.1

To Reproduce
Steps to reproduce the behavior:
error when install spark env

Screenshots
full traceback
Screen Shot 2021-03-19 at 09 35 48

The "set-output" command is deprecated and will be disabled soon.

Describe the bug
The set-output command is deprecated and will be disabled soon.

Which version of the action are you using?
v1

Environment
GitHub-hosted

Spark Versions
All

Screenshots
image

Additional context
Warning: The set-output command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.