Code Monkey home page Code Monkey logo

Comments (14)

jelber2 avatar jelber2 commented on August 23, 2024

change -t 8 to -t 64 perhaps

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

Thanks, i have runned the 55 threads shell and not break the 8 threads process. How long do you think it will take to run the both scripts

from haslr.

jelber2 avatar jelber2 commented on August 23, 2024

Did you change the output directory? I have no idea how long it might take? Depends on coverage of long and short reads?

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

sure,i set a new output dir

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

it is always running minia for over 24 hours ,is it normal?

from haslr.

jelber2 avatar jelber2 commented on August 23, 2024

Minia is very fast, but genome size and coverage influence its runtime also probably choice of k-mer length and other similar types of settings.

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

So,is there any recommended parameters for running human genome assembly?

from haslr.

jelber2 avatar jelber2 commented on August 23, 2024

Are you trying out the assembler with someone else's data or do you have a new human genome assembly that you would like to make with your own data? I would think that it would have finished by now (~5 days running). Again, you haven't specified the coverage of the Illumina or I guess Oxford Nanopore data that you are using. You can also read the paper describing HASLR for perhaps more information on the program.

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

Sorry,im trying to assembly a human genome, The coverage of both short reads and long reads is 120X

from haslr.

jelber2 avatar jelber2 commented on August 23, 2024

I would recommend you try either GraphAligner (https://github.com/maickrau/GraphAligner) or Ratatosk (https://github.com/DecodeGenetics/Ratatosk) to error correct your Nanopore reads with your Illumina reads then assemble with Flye (https://github.com/fenderglass/Flye) using the --nano-corr option. Ratatosk even has a faster reference based method whereby to correct the reads (I haven't used this method, so I don't know the details). For Flye I really don't think you need 120x Nanopore coverage, especially if you can correct the reads. See here for running Ratatosk or here for running GraphAligner.

Edit: I guess you could use 120x Nanopore reads for a Human assembly (https://github.com/fenderglass/Flye/blob/3ee5b3390a5f88c36d0869d0382c75aba3b1f5cc/README.md#flye-benchmarks), although these data come from CHM13 (homozygous cell line). Also note the 4000 CPU hours (divide 4000 by number of available cores and you get approximately how many wall hours the assembly would take).

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

Thanks,jelber2.
so haslr is not advised ?why?

from haslr.

jelber2 avatar jelber2 commented on August 23, 2024

In my experience, HASLR will generate very good statistics (N50, etc) for assembly using raw long reads and accurate short reads, but the error rate (indels and substitutions) of the final assembly is similar to the error rate of the long reads and not the short reads. One can improve the error rates by using long reads corrected by the short reads, and using the corrected long reads as input, but then the assembly statistics suffer. This is based off of simulation of course, and simulations are sometimes useful but can never fully capture the intricacies of real data.

from haslr.

haghshenas avatar haghshenas commented on August 23, 2024

Hi @koujiaodahan and thanks for trying HASLR.
I'm surprised that Minia is taking so long to finish. In my experience, on short read datasets from human genome with about 40x coverage, it takes about 5 hours to finish. Are you sure that Minia assembly was the step that took a long time to finish?
If yes, one solution could be subsampling short reads to about 40-50x coverage. You can use fastutils command that comes with HASLR for that purpose. So assuming you have a paired end dataset, you do the following:

fastutils interleave -q sr_1.fastq sr_2.fastq | fastutils subsample -q -g 3g -d 40 > sr_40x.fastq

With regards to the error rate of the final assembly that was raised by @jelber2, if you eventually want to perform polishing for your assembly, our results show that polished HASLR assemblies are as accurate as polished assemblies from other tools.

from haslr.

koujiaodahan avatar koujiaodahan commented on August 23, 2024

yeah, i agree that the coverage is too high,so i downsampled and i got error which i released at #20 .
and i want to know whether your polishing method means running wtdbg2.pl after running haslr?

from haslr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.