Code Monkey home page Code Monkey logo

Comments (11)

jorondo1 avatar jorondo1 commented on August 24, 2024 14

Hi! I am also curious to know if anything changed since this thread was started. Cheers

from bwa.

Stack7 avatar Stack7 commented on August 24, 2024 2

Hi! I will be very happy to see any news in this threads! I am just dreaming about the threads option! It would be great! Cheers!

from bwa.

lh3 avatar lh3 commented on August 24, 2024

No, there is no pull request on multi-threaded indexing. Implementing one may take quite some time but might not dramatically improve the performance, especially when you try to build the index within limited space.

Generally, to build a large index, you may consider to use a large block size (option "-b"). This option defaults to 10,000,000. You may increase it to 100,000,000 or even larger, depending on your input. This may save you some time.

from bwa.

unode avatar unode commented on August 24, 2024

@lh3 Thanks, increasing -b does seem to improve speed considerably.

However I don't quite understand the impact of changing this option. At least during indexing, I don't see any significant memory increase even with values as large as 10,000,000,000.

What's the trade-off or otherwise, why isn't the default value larger?

from bwa.

lh3 avatar lh3 commented on August 24, 2024

-b specifies how many bases to process in a batch. The memory used by one batch is 8*{-b}. If you have a "reference genome" larger than 200Gb, you won't observe obvious memory increase with -b set to 10G. For a 3Gb human genome, setting -b to 10G will make the peak RAM 8 times as high at the BWT construction phase.

from bwa.

unode avatar unode commented on August 24, 2024

So if I understand correctly, the ideal -b value is around # of bases / 8.
Wouldn't it be possible to have this value adjusted automatically?
From what I gather, there's a first pass that packs the FASTA file. Is the -b value already used at this stage? If not, could this stage be used to calculate the ideal -b value?

On the other hand, if finding the ideal -b during the "pack" phase is impractical, would it be reasonable to have:

  • -b set to "auto" by default
  • if -b is set to "auto" perform a full file scan to calculate the ideal -b.
  • if -b is set to anything but "auto", skip the full file scan and use the given value.

from bwa.

lh3 avatar lh3 commented on August 24, 2024

-b is only used when bwa generate "ref.fa.bwt". At that step, bwa index already knows the total length of the reference. -b was added when I wanted to index nt. I have only done that once, so did not bother to explore the optimal -b in general. Yes, it should be possible to automatically adjust -b, but before that I need to do some experiment to see how speed is affected by -b. Thanks for the suggestion anyway.

from bwa.

unode avatar unode commented on August 24, 2024

From the tests I've been running, changing the -b value from the default of 10,000,000 to 500,000,000 to index a ~90Gb fasta file made the entire process roughly 6 times faster.
I'm now also giving it a try with a value of 20,000,000,000 computed by dividing the value of textLength by 8. If this scales well, I expect a gain of at least 8 times.

from bwa.

lh3 avatar lh3 commented on August 24, 2024

Thanks for the data. 6 times is a lot, much larger than my initial guess. I will consider to automatically adjust -b in a future version of bwa.

from bwa.

serge2016 avatar serge2016 commented on August 24, 2024

Hello! Any news in this stream?

from bwa.

emrahkirdok avatar emrahkirdok commented on August 24, 2024

Thanks for the data. 6 times is a lot, much larger than my initial guess. I will consider to automatically adjust -b in a future version of bwa.

Hi, I hope everyone is OK in this thread. I am working with large fasta files and I am wondering if this feature is implemented in the current version? Or will it be implemented any time soon? Or should I continue optimising it?
best wishes

from bwa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.