Code Monkey home page Code Monkey logo

Comments (17)

Asmaa-Ali avatar Asmaa-Ali commented on August 22, 2024

@jmabuin Please, I need help to solve this problem.

from sparkbwa.

jmabuin avatar jmabuin commented on August 22, 2024

Hi @Asmaa-Ali

SparkBWA creates temporary .sam files in the computing nodes (executors) while the program is executing. Can you connect to one of your nodes and check if this temporary files have some content?

This files are stored in the spark temporary directory or in /tmp/, you can find them by using the find command
find /tmp/ -name "*.sam"

Also, what is the content of your /Data/HumanBase/ dir? is this content available in all nodes?

from sparkbwa.

linhbngo avatar linhbngo commented on August 22, 2024

I run into the same issue, and my temporary .sam files inside /tmp also have no content.

from sparkbwa.

jmabuin avatar jmabuin commented on August 22, 2024

Can you please check if it works with SparkBWA 0.2 ?

from sparkbwa.

zargaboy1 avatar zargaboy1 commented on August 22, 2024

Hi, I have tried it with SparkBWA 0.2 and I run also into the same issue. It works fine and then the produced sam files are empty ! The content of /Data/HumanBase/ dir is ivailable for all my worker nodes. Any suggestions, please ?

from sparkbwa.

jmabuin avatar jmabuin commented on August 22, 2024

Do you have permissions to write in the tmp folder? also, which versions of Hadoop and Spark are you using?

from sparkbwa.

zargaboy1 avatar zargaboy1 commented on August 22, 2024

Yes I do have thr ights to write in /tmp.
I am using spark2.0 and hdfs 2.7.3 .
Any ideas welcome !

from sparkbwa.

jmabuin avatar jmabuin commented on August 22, 2024

Have you tried to use yarn-cluster instead of yarn-client?

Actually in newer versions of Spark it should be --master yarn --deploy-mode cluster

from sparkbwa.

avapirev avatar avapirev commented on August 22, 2024

Same issue here. I try also to run with the local scheduler (no yarn). Could that be the reason for missing/empty sam files? Does all input data need to be through hdfs? It also complains it can't find the index file and have set up proper permission to all locations. Thank you.

Update: Obviously, one really needs a running hadoop cluster so that the code can work on the data in HDFS. Hence the empty sam files in Spark standalone cluster mode. It would be nice if there were an option for running a spark standalone instance. Hadoop can be a real pain under Torque/PBS job schedulers.

from sparkbwa.

xubo245 avatar xubo245 commented on August 22, 2024

I fix. Now SparkBWA-0.2 can run yarn or standalone and output sam file in my local cluster.

from sparkbwa.

avapirev avatar avapirev commented on August 22, 2024

@xubo245 Thanks a lot - the fix works. I can confirm that it also runs in Spark standalone mode (no Hadoop FS)

from sparkbwa.

xubo245 avatar xubo245 commented on August 22, 2024

You are welcome.

from sparkbwa.

tushu1232 avatar tushu1232 commented on August 22, 2024

@xubo245 It is still not working for me in standalone mode.Have you done any more specific changes

from sparkbwa.

xubo245 avatar xubo245 commented on August 22, 2024

Yes, I have temporary changes for standalone, but it is not best solutions...

com.github.sparkbwa.BwaAlignmentBase#copyResults

  		 Configuration conf = new Configuration();
        conf.set("fs.default.name","hdfs://Master:9000/");
		FileSystem fs = FileSystem.get(new URI("hdfs://Master:9000/"),conf);

The Master should be your cluster hostname.

from sparkbwa.

tushu1232 avatar tushu1232 commented on August 22, 2024

@xubo245 We are running on non-pdfs environment using GPFS.
How can we make it general fs.The fs is available on every nodes similar to hdfs

from sparkbwa.

tushu1232 avatar tushu1232 commented on August 22, 2024

#38 Attached the spark run

from sparkbwa.

xubo245 avatar xubo245 commented on August 22, 2024

You should replace HDFS code with GPFS API, but I do not known GPFS...

SparkBWA has many many HDFS code...

from sparkbwa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.