mahulchak / quickmerge Goto Github PK
View Code? Open in Web Editor NEWA simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
License: GNU General Public License v3.0
A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
License: GNU General Public License v3.0
Hello,
I have an error when using quickmerge. The compilation looks like it worked.
~/progs/quickmerge/merger
(master)>make
make: `quickmerge' is up to date.
Upon trying to launch it:
~/progs/quickmerge/merger
(master)>./quickmerge
-bash: ./quickmerge: /software/lib64/ld-linux-x86-64.so.2: bad ELF interpreter: No such file or directory
Is quickmerge a 32 bit application? Is there a way to force the compilation with the interpreter that is available to me? (It's a cluster I can't install libraries on it easily myself).
Thanks!
Hi,
When using reference assembly to improve the hybrid assembly, e.g, contig A in the reference assembly can improve contig B in the hybrid assembly, will quickmerge further check if there are genomic equivalent of contig A in the hybrid assembly? If not, then there will be redundancy introduced by quickmerge.
Best,
Danshu
% quickmerge -V
quickmerge 0.2
% echo $?
0
(to stdout. and clean exit code)
scaffold100000|size4398 C27495846419.0 4398 287
1003 1288 287 1 1 1 0-2400
scaffold100000|size4398 C27567557919.0 4398 322
1800 2026 322 95 2 2 0-1000
scaffold100000|size4398 scaffold13097019.4 4398 3412
2129 2283 167 10 4 4 0-112-1-250
scaffold100000|size4398 scaffold44576520.6 4398 5548
1458 1695 3065 2826 3 3 0-202-3203867 4222 1115 757 7 7 0-14124-71-2332-58-60
scaffold100001|size4398 C2721590907.0 4398 209
4144 4248 140 36 0 0 00
scaffold100001|size4398 scaffold133571412.7 4398 1391
3481 4011 809 1347 28 28 0-216-1-1-1-1-1-1-1166-80
scaffold100001|size4398 scaffold4287048.3 4398 1847
525 792 269 4 7 7 025-1054511-460
scaffold100001|size4398 scaffold86268516.5 4398 1028
1381 1726 281 627 8 8 0-24-1720
scaffold100002|size4398 C27479571713.0 4398 281
584 858 281 7 0 0 00
scaffold100003|size4398 C27185330824.0 4398 203
931 1133 1 203 1 1 00
##############################################333
This is the result in my merged.fasta, what does this mean?
I thought may be there are some problems here....
Hi,
When I try to run quickmerge using assembly A as reference (Illumina assembly ~746MB) and assembly B (nanopore assembly ~846MB) as query, it works fine.
However, when I switch assembly A as query and assembly B as reference, it throws an error: terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 1047464) > this->size() (which is 0)
Aborted
I changed the file format (before running the quickmerge) using the following command in the both of cases:
merge_wrapper.py nanopore_assembly.fasta illumina_assembly.fasta --no_nucmer --no_delta --clean_only
Could you please give me insight on this issue?
Regards,
Niraj
Hi,
I have an issue after running merge_wrapper.py
4: FINISHING DATA
0 quickmerge
1 -d
2 out.rq.delta
3 -q
4 hybrid_oneline.fa
5 -r
6 self_oneline.fa
7 -hco
8 5.0
9 -c
10 1.5
11 -l
12 0
13 -ml
14 5000
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
Hi,
I'm trying to run quickmerge with an assembly generated with dbg2olc and canu. I run both the wrapper script and individual steps as described on the readme. I don't receive any errors. I do notice that the resultant merged.fasta has the same content exactly as the hybrid_assembly.fasta. I can also see that while the summaryOut.txt and aln_summary.tsv seem normal anchor_summary.txt file is empty except for a header line. Please let me know if any additional info would be useful in determine why the merging doesn't seem to be working.
Hello ,
I am facing a problem related to 'std::out_of_range'.When I ran quickmerge first time it worked fine with the following command1.But when I change the reference and query( Command 2 : swapped query & reference ) it showing the error std::out_of_range error.I am using latest build.Please let me know how can I solve the issue.
command 1
nucmer -l 100 -prefix out New_CspixV2gen.fa 0x_spix.fa
delta-filter -i 95 -r -q out.delta > out.rq.delta
quickmerge -d out.rq.delta -r New_CspixV2gen.fa -q 10x_spix.fa -hco 5.0 -c 1.5 -l 1000000 -ml 6000
command 2
nucmer -l 100 -prefix out 10x_spix.faNew_CspixV2gen.fa
delta-filter -i 95 -r -q out.delta > out.rq.delta
quickmerge -d out.rq.delta -r 10x_spix.fa -q New_CspixV2gen.fa -hco 5.0 -c 1.5 -l 1000000 -ml 6000
Would it be possible to merge those uniq contigs in reference assembly into query assembly ? By calling it "uniq" I mean those that have little overlap with any contigs in query assembly.
Hi @mahulchak ,
We used quickmerge to merge a PacBio + Dovetail + Bionano (pb+dt+bn) scaffolds and a ONT assembly,
I found that a 83290869bp scaffold got discarded in the quickmerge results. And this scaffold is actually NOT in the anchor summary output. I could not figure out the reason. I suspect the reason would be misassembly, but cannot find a prove. It will be very appriciated if you can take a look at the following info I attached and give me some suggestions.
Here I attached a Excel sheet of NUCmer alignment (converted from .delta format to .paf using Li Heng's paftools.js for readability) of this scaffold to quickmerge assembly.
For the file header
header | details |
---|---|
query_id | Sequence id in our pb + dt + bn scaffolds |
query_length | pb + dt + bn scaffold length |
query_start | start of alignment on pb + dt + bn scaffold |
query_end | end of alignment on pb + dt + bn scaffold |
relative_strandness | relative strandness |
quickmerge_id | Sequence id in quickmerge result |
quickmerge_length | quickmerge sequence length |
quickmerge_start | start of alignment on quickmerge sequence |
quickmerge_end | end of alignment on quickmerge sequence |
For quickmerge_id, there are two possibilities:
Thank you very much!
Hi,
you have a great software here. I just wanted to bring something up to perhaps make it even better.
I think it might be useful to name the files the program creates more uniform. I mean: name them all using the prefix that is given on the cmdline. I notice that some files have a general name (eg. self_oneline.fa or the merged.fasta outputfile) which causes them to be overwritten when I run several jobs at once (where each uses a different prefix though).
thx and keep up the good work.
Hi all,
I have got an error when I run merge_warpper.py. Can anyone suggest how to fix the following error?
I used the following script.
/home/tg484/quickmerge/merge_wrapper.py -p Dbia_merge_wrapper Dbia_min1000_Illumina.fasta Dbia_min1000_nanopore.fasta -hco 5.0 -c 1.5 -l 2791184 -ml 15000
Error: Multiple query file is only supported with the SAM output format
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, Dbia_merge_wrapper.delta
error no: 400
Traceback (most recent call last):
File "/home/tg484/quickmerge/merge_wrapper.py", line 176, in
subprocess.call(mergercall)
File "/home/tg484/anaconda3/lib/python3.6/subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/tg484/anaconda3/lib/python3.6/subprocess.py", line 729, in init
restore_signals, start_new_session)
File "/home/tg484/anaconda3/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'quickmerge': 'quickmerge'
Thank you,
Thiru
quickmerge -d out.rq_1.delta -q /ufrc/pelzstelinski/sneupane/Analysis_of_Demultiplexed_10_wDi_DNA_HMW_Unpure_1st_Elution/wDi_canu_seqtk45_CCS_1000_corr0.015_500_100_GOOD/uni_corrected/assembly_edited.fasta -r /ufrc/pelzstelinski/sneupane/Analysis_of_Demultiplexed_10_wDi_DNA_HMW_Unpure_1st_Elution/wDi_canu_seqtk45_CCS_1000_corr0.015_500_100_GOOD/wDi.contigs.fasta_headeredited_spaceremoved.fa -hco 5.0 -c 1.5 -l 1250000 -ml 10000
0 quickmerge
1 -d
2 out.rq_1.delta
3 -q
4 /ufrc/pelzstelinski/sneupane/Analysis_of_Demultiplexed_10_wDi_DNA_HMW_Unpure_1st_Elution/wDi_canu_seqtk45_CCS_1000_corr0.015_500_100_GOOD/uni_corrected/assembly_edited.fasta
5 -r
6 /ufrc/pelzstelinski/sneupane/Analysis_of_Demultiplexed_10_wDi_DNA_HMW_Unpure_1st_Elution/wDi_canu_seqtk45_CCS_1000_corr0.015_500_100_GOOD/wDi.contigs.fasta_headeredited_spaceremoved.fa
7 -hco
8 5.0
9 -c
10 1.5
11 -l
12 1250000
13 -ml
14 10000
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
Aborted
Hi,
Code improvements to fix errors with merge_wrapper.py:
#if args.length_minimum:
mergercall.append('-ml')
mergercall.append(str(length_minimum))
#if args.prefix:m
mergercall.append('-p')
mergercall.append(str(prefix))
nico ;)
Hi,
I'm having segmentation fault at the quickmerger stage after using the merge_wrapper. Nucmer and everything before worked fine and created normal outputs, including oneline.fa files.
-rw-r--r-- 1 guerrer QGGP 2456765 Jul 22 13:23 anchor_summary_out.txt
-rw-r--r-- 1 guerrer QGGP 1049205219 Jul 22 13:22 hybrid_oneline.fa
-rw-r--r-- 1 guerrer QGGP 0 Jul 22 13:23 merged_out.fasta
-rw-r--r-- 1 guerrer QGGP 91857760 Jul 22 13:22 out.delta
-rw-r--r-- 1 guerrer QGGP 9563554 Jul 22 13:22 out.rq.delta
-rw-r--r-- 1 guerrer QGGP 3982704 Jul 22 13:23 param_summary_out.txt
-rw-r--r-- 1 guerrer QGGP 1343593069 Jul 22 13:22 self_oneline.fa
The quickmerge error must have to do with my input self_oneline.fa, it's the only new thing.
Now, instead of rerrunning the whole wrapper, I'm just running this command:
quickmerge -d out.rq.delta -q hybrid_oneline.fa -r self_oneline.fa -hco 5.0 -c 1.5 -l 0 -ml 5000 -p out
Results:
File size First line(fasta header)
hybrid_oneline.fa 1049205219 >1
self_oneline.fa 845417800 >tig00035160 Success!
hybrid_oneline.fa 1049205219 >1
self_oneline.fa 1343593069 >tig00000003_pilon Segmentation fault (core dumped)
The only difference is the size and origin of the self file. The successful one is a canu assembly after using purge haplotigs to eliminate haplotigs (alternative diploid contigs ). The unsuccessful one is a canu assembly but without using purge haplotigs (thus it is heterozygous/diploid).
Is my problem derived from the input sizes? Or maybe from heterozygosity?
Hi,
We have 50x Illumina paired-end and 7x ONT reads. Is QuickMerge able to produce a hybrid assembly out of these data?
Thank you in advance.
Michal
Hi,
some things I noticed while trying quickmerge:
make_merger.sh has wrong compilation instructions
should be "g++ -Wall -o quickmerge quickmerge.cpp qmergelib.cpp -I." instead of "g++ -Wall work_in_prog_temp.cpp exp_testlib.cpp -o merger"
MUMmer compilation might fail, because fodler aux_bin isn't created.
Running the quickmerge wrapper just prints all the scaffolds and contigs to stdout. The headers are printed twice, then the sequence itself.
Chris
Hi Mahul,
Thanks for writing and maintaining a terrific program. I've used your software successfully on a previous genome by merging a pair of 10x and Nanopore assemblies. I thought I'd give it a shot on a different genome (from a different organism), but this time the output of quickmerge
is an empty fasta file. It appears that the program has run without any errors, so I wasn't sure where to start troubleshooting (or how to interpret an empty fasta file).
I ran the following code using the wrapper:
QMERGEPY=/path/to/merge_wrapper.py
$QMERGEPY -pre mrgd nanopore.fasta hic.scaffolds.fasta
The following files were present in the directory where quickmerge
was executed (after the job had completed):
-rw-r--r-- 1 dro49 cluster 705K Dec 16 22:51 anchor_summary_mrgd.txt
-rw-r--r-- 1 dro49 cluster 2.8M Dec 16 22:50 param_summary_mrgd.txt
-rw-r--r-- 1 dro49 cluster 15M Dec 16 22:50 aln_summary_mrgd.tsv
-rw-r--r-- 1 dro49 cluster 0 Dec 16 22:50 merged_mrgd.fasta
-rw-r--r-- 1 dro49 cluster 60M Dec 16 22:50 hic.rq.delta
-rw-r--r-- 1 dro49 cluster 1.1K Dec 16 22:50 qmerge.log
-rw-r--r-- 1 dro49 cluster 159M Dec 16 22:49 hic.delta
-rw-r--r-- 1 dro49 cluster 1.9G Dec 16 18:10 self_oneline.fa
-rw-r--r-- 1 dro49 cluster 1.9G Dec 16 18:08 nanopore.fasta
-rw-r--r-- 1 dro49 cluster 1.9G Dec 16 17:44 hic.scaffolds.fasta
Happy to share any further into that is useful for troubleshooting. Appreciate your help and insights,
Devon
Hello! Mahul
I am using the merge_wrapper.py approach.
I noticed that a second round of quickmerge from quickmerge output would not complete.
It stops after generating out.delta, without generating the final merged.fasta
It is found that by correcting the fasta from one-line sequence into standard 80 characters per line, the script can complete.
May I ask if I might've missed anything?
Or if it is possible to generate merged.fasta into 80 characters, or to let quickmerge accept one-line sequence?
Many thanks!
Hi,
I am dealing with a large genome and therefore running quickmerge took a very long time, is there a way to run the quickmerge wrapper in multi-thread to accelerate the process? I check the wrapper and nucmer help manuals, there seems no option for multi-thread.
Shu
Hi,
Could you go into more detail on option "-c"?
-hco: controls the overlap cutoff used in selection of anchor contigs. Default is 5.0.
-c: controls the overlap cutoff for contigs used for extension of the anchor contig. Default is 1.5.
According to the thesis,
-hco=overlapping region/non-overlapping region
What is the difference between "-c" and "-hco"?
Hi there,
I tried to run quickmerge to combine hybrid assembly (using spades) with pacbio only assembly (using canu) and got the following error message: libc++abi.dylib: terminating with uncaught exception of type std::out_of_range: basic_string
The first two steps look fine, without any error messages. This was produced at the last step using quickmerge. Do you have an idea how can I fix this?
I also tried with the python wrapper script and got a different error message.
I attached the full log file and the commands used if you want to have a look.
merge_wrapper_errors.txt
quickmerge_errors.txt
Thanks
Tuan
Is there any parameter to speed up quickmerge progress, for example multiple threads?
Thanks
Hi,
I try to run quickmerge with two wgs-assemblies of a1.4 Gb genome but run into a segmentation fault in the merge step. N50 of the assemblies is about 200 Kb.
my commands:
nucmer -l 100 -prefix out pex_1.ctg.fasta pex_2.ctg.fasta
delta-filter -i 95 -r -q out.delta > out.rq.delta
quickmerge -d out.rq.delta -q pex_2.ctg.fasta -r pex_1.ctg.fasta -hco 5 -c 1.5 -l 200000
The stderror is unfortunately not very informative:
/opt/sge/default/spool/binfservas12/job_scripts/116904: line 14: 99325 Segmentation fault /home/mmoser/quickmerge/merger/quickmerge -d out.rq.delta -q pex_2.ctg.fasta -r pex_1.ctg.fasta -hco 5 -c 1.5 -l 200000
Stdout contains a list of 37 contigs present in anchor_summary.txt (which has anchors for 1172 contigs).
Hope to get a clue how to resolve this problem. I ran everything on a SGE cluster. RAM usage didnt seem to be too high when the job terminated.
Thank you,
Michel
Hi,
This is more of an enhancement/feature request than an issue!
Is there any reason that you are shipping mummer version 3 rather than the 4th one?
Also, given that I work with genomes > 2Gb normally, I have this query. Have you tried minimap2-> sam -> delta and then using those for quickmerge?
[root@localhost Quick1]# sh make_merger.sh
g++ -O3 -Wall -o quickmerge quickmerge.cpp qmergelib.cpp -I.
quickmerge.cpp:8:19: fatal error: iostream: No such file or directory
#include
^
compilation terminated.
qmergelib.cpp:1:19: fatal error: iostream: No such file or directory
#include
^
compilation terminated.
make: *** [quickmerge] Error 1
mkdir: cannot create directory ‘aux_bin’: File exists
check complete
cd /home/tools/Quick1/MUMmer3.23/src/kurtz; make mummer
make[1]: Entering directory /home/tools/Quick1/MUMmer3.23/src/kurtz' cd libbasedir; make libbase.a make[2]: Entering directory
/home/tools/Quick1/MUMmer3.23/src/kurtz/libbasedir'
/usr/local/bin/gcc -O3 -c -o cleanMUMcand.o cleanMUMcand.c
In file included from types.h:13:0,
from cleanMUMcand.c:11:
/usr/include/sys/types.h:146:20: fatal error: stddef.h: No such file or directory
#include <stddef.h>
^
compilation terminated.
make[2]: *** [cleanMUMcand.o] Error 1
make[2]: Leaving directory /home/tools/Quick1/MUMmer3.23/src/kurtz/libbasedir' make[1]: *** [mummer] Error 2 make[1]: Leaving directory
/home/tools/Quick1/MUMmer3.23/src/kurtz'
make: *** [kurtz] Error 2
[root@localhost Quick1]# ls
quickmerge-0.3.tar.gz
[root@localhost Quick1]# tar -xjvf quickmerge-0.3.tar.gz
bzip2: (stdin) is not a bzip2 file.
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Hi Mahul,
I saw your talk at the PacBio UGM at Standford, read your paper, and wanted to give quickmerge a try. However, I cannot seem to get the program to work and I was hoping you could help me. Probably unrelated: when I compiled quickmerge I get the following warnings:
$ make
g++ -O3 -Wall -o quickmerge quickmerge.cpp qmergelib.cpp -I.
qmergelib.cpp: In function 'void nOvlStoreCalculator(asmMerge&)':
qmergelib.cpp:367:44: warning: 'noRovl' may be used uninitialized in this function [-Wuninitialized]
qmergelib.cpp:367:44: warning: 'noLovl' may be used uninitialized in this function [-Wuninitialized]
qmergelib.cpp: In function 'void discAnchor(std::string&, asmMerge&, std::string&, double)':
qmergelib.cpp:1761:3: warning: 'cutoff' may be used uninitialized in this function [-Wuninitialized]
Being as these are not critical errors, I tried to run quickmerge anyway on a pre-generated nucmer delta file and it's associated fasta files. quickmerge creates the expected output files with only headers, emits no error messages, and exits with code 0. My command-line was:
$ ~/tools/bin/quickmerge/quickmerge -l 1200000 -hco 5 -c 1.5 -ml 50 -d in.delta -r r.fasta -q q.fasta
My de novo contigs were generated by Canu and my hybrid assembly by DBG2OLC (with some custom contig breaking along the way). I made sure to remove any whitespace in the fasta headers and reformat the fasta sequences to occupy a single line (no line wrapping). Running merge_wrapper.py --clean_only
creates no ouput (the code block following the conditional statement on line 112 is not evaluated).
A snippet of my delta file:
$ grep -A 1 '^>' in.delta | head
>tig00002811 Backbone_1000_1_621250 588440 621250
271694 369472 10 97855 296 296 0
--
>tig00007630 Backbone_1000_1_621250 278801 621250
1 20162 316551 336715 80 80 0
--
>tig00008053 Backbone_1000_1_621250 240074 621250
6546 7428 609882 610823 126 126 0
--
>tig00001422 Backbone_1000_1344501_3092750 1131566 1748250
A snippet of my query file:
$ head -2 q.fasta | cut -c1-30
>Backbone_1_1_343250
TCTTTTAAACAAAGTGGAGAACAAAAACTA...
A snippet of my ref file:
$ head -2 r.fasta | cut -c1-30
>tig00000005
ATCATCATGGAAGTTCAGCTAGAGGAGTTA...
Hi,
I am running quickmerge on an eukaryotic genome, which is around 1Gb. I got two assemblies from Canu (960Mb) and Miniasm( 870Mb).
By using the default parameters of quickmerge, the merged assembly is ~1Gb and N50 is 3Mb, and I am quite satisfied with the metrics . However, as I would like to study the chromosome structure of this species, it is really important to have the orientation of the merged contigs as accurate as possible.
Thus, could you provide any suggestion for the nucmer parameters to exclude mismerge for such big genome? I am re-running quickmerge with nucmer --maxmatch -c 500 -l 100, however, it is still difficult to determine how stringent should it be.
Thanks,
Which is first on the command line?
This may sound trivial, but to someone like me who is trying to figure out which of my assemblies I should give first on the command line and which second, it isn't helping that every time the assemblies are discussed they have different names.
The NAR manuscript uses both donor/acceptor (which to me seems the most informative) and reference/query. It seems to mostly use reference/query in discussion but figure 4 only uses donor/acceptor.
In description of how to run the wrapper, the main readme here calls them hybrid and self. I have yet to figure out which of those is the donor (a.k.a reference) and which is acceptor (a.k.a. query).
The manuscript also says it merge a hybrid and a pacbio. If the paper tells me which was donor it has eluded me.
The quickmerge wiki gives some advice for deciding which of my assemblies should be the query, and which reference. But, it fails to tell me which order these should appear on the command line.
Is there anywhere with a full run command including all the parameters that need to be set? Also, what is the nature and format of the output?
The -h
information is pretty limited. I can see that I need to set -l seed_length_cutoff -ml merging_length_cutoff
but don't really know how these are used to make an educated guess as to what to set.
Hi Mahul,
Just a note -- I had to change this in merge_wrapper.py:
to
for it to work.
It might be good to change this in your code as the latter will figure out the correct location of python on anyone's system.
best,
John
Hi Mahul,
Based on BUSCO analysis, quickmerge is introducing duplications into my genome sequence. The original versions of my assemblies have 17 and 27 duplicated BUSCOs but, after quickmerge, there are 78 or 398 duplicated BUSCOs, depending on merge order. I tried the two-stage approach but the number of duplicated BUSCOs just keeps getting higher with each iteration. Any idea why this is happening or how I can remedy the situation? I have been running merge_wrapper.py with -hco 7 -c 2 -lm 5000 -l 5000000. My genome is from a bee and is ~400Mb (though the assemblies are closer to 300Mb) and known to be somewhat repetitive. My original assemblies are both pretty complete and are only missing ~2% of 4,415 BUSCOs.
Any help would be much appreciated.
Thanks,
Ben
Hi,
Is it possible to include a description of all the output files to expect from running merge_wrapper.py ?
Or did I overlook that?
best,
John
Hello,
I'm trying to merge two assemblies of the same individual using different approaches: one refers to a previously generated illumina draft assembly based on very high coverage available, the other is a canu assembly produced from self corrected reads and polished with quiver and pilon. My pacbio coverage is modest, as after error correction I was just able to use 45% of the data (~30X coverage).
I believe my best draft is the illumina one because I think it is capturing a broader portion of the genome. Although a little bit more fragmented than the canu assembly (~23k scaffolds vs ~18k contigs), the N50 of the Illumina one is much higher (~450k vs ~91kb). Therefore, following your recomendations and my sensibility I understand that using the illumina draft as the query (in the quickmerge wrapper the hybrid assembly positional argument) must be the best solution (pacbio assembly will help closing regions that short read assembly didn't capture), despite I tried the other approach (pacbio self assembly as query).
The quast output displays an improvement in both cases (file attached), with best metrics achieved when using illumina draft as query (best N50, less scaffolds, best genome size).
As I understood, quickmerge mostly outputs sequences from the query genome that were joined by the reference genome, as well as the query sequences that remained unaligned. The reference sequences are not included in the ouptut, and if I want them, I should follow recommendations on issue #11. However, I observe in the output fasta headers coming from both assemblies. Furthermore, checking their length in the merged file and in the original assembly I see that the Illumina scaffolds (which I think served as the query) have the exact same length as the original draft, and the pacbio based contigs (the reference) have either longer or the same as before.
My questions are:
a) Given the following commnad, which sequences will serve as queries?
merge_wrapper.py -pre draftAsQuery -l 1000 illumina.fasta pacbio.fasta
In the alignment summary file I see in the 1st columns (REF), sequences coming from pacbio assembly as I expected, but in the merged fasta I see sequences from both, particularly contig extensions in the reference sequence.
b) Is ok for quickmerge to provide scaffolds (with Ns) instead of contigs ?
c) Could you comment, given my case of having a full Illumina assembly, the applicability of the tool ?
Thanks in advance,
Pedro Barbosa
quickmergeResults.txt
Hi Mahul,
I am trying to use quickmerge but am receiving the following error:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 18446744073709536653) > this->size() (which is 342465)
Looking at some of the other issues, I've seen this error come up a few other times. However, it looked the culprit was fasta files with whitespaces in the header names, or sequences not on one line. I do not believe this to be the issue in this case, as I first started with the merge_wrapper.py script. I run the command as follows:
merge_wrapper.py ../scaff10x_rounds2/renamed.sspace_scaff10x.2.fasta ../canu_assembly/asm/AM.contigs.fasta
I can see that it correctly creates the files hybrid_oneline.fa and self_oneline.fa in my current working directory. If I look at the first few headers in each file:
cat hybrid_oneline.fa|grep ">"|head -n 5
>1
>2
>3
>4
>5
cat self_oneline.fa|grep ">"|head -n 5
>tig00000004_len=34946_reads=29_covStat=35.85_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000005_len=26830_reads=11_covStat=22.77_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000007_len=146883_reads=146_covStat=247.16_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000009_len=142320_reads=139_covStat=238.60_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000013_len=39096_reads=25_covStat=60.84_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
Everything looks correct. I have also tried cutting a lot of the unnecessary text in the headers for self_oneline.fa, leaving the headers as ">tigXXXX" in a file called renamed_self.fa. If I try running the quickmerge command, following the order of arguments as on the wiki
quickmerge -d out.rq.delta -q hybrid_oneline.fa -r renamed_self.fa -hco 5 -c 1.5 -l 200000 -ml 5000
I still get the same error. What else, if anything, besides wrongly formatted fasta files could be throwing this error? Thanks for any information/insight and I hope to get this working!
Hello:
I had run quickmerge by the following commands:
nucmer -l 100 -prefix out contig.fasta pacbio_assemble.fasta
delta-filter -r -q -l 10000 out.delta > outrq.delta
quickmerge -d outrq.delta -q contig.fasta -r pacbio_assemble.fasta -hco 5 -c 1 -l n -ml m -p prefix
But in the third steps, I got an error,and nothing carry out. lIKE THIS:
n -ml m -p prefix
0 quickmerge
1 -d
2 yylrq.delta
3 -q
4 /mnt/data/liyunxia/4-project/mito/YYL_Mtctgs-zhu2-uniq.fasta
5 -r
6 /mnt/data/liyunxia/4-project/mito/Mashmap/fmap/YYL.pass.fa@fmlrcor@mashmap2-idseq
7 -hco
8 5
9 -c
10 1
11 -l
12 n
13 -ml
14 m
15 -p
16 prefix
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoi
Aborted (core dumped)
I check the gdb -c core.317488, and I got this one:
Missing separate debuginfo for the main executable file
Try: yum --enablerepo='debug' install /usr/lib/debug/.build-id/9a/5d301e005e22924e5bf88f3b95641c2490a441
Core was generated by `quickmerge -d out.delta -q /pwd/contig.fasta'.
Program terminated with signal 6, Aborted.
#0 0x00007fe7c0ef2207 in ?? ()
(gdb) where
#0 0x00007fe7c0ef2207 in ?? ()
#1 0x00007fe7c0ef38f8 in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()
So, what‘s’ the problems with this run?
Hope for your reply.
Greetings of the season.
I would like to suggest a development of conda version of this amazing tool.
Thank you.
Yedomon
Because int type range is ( -2147483648 to 2147483647), when the merged genome size is over 2147483647 (2.15Gb) will report throw an error like "std::out_of_range".
From BerryGenomics zhk.
Hello,
I had a problem when running quickmerge. The error message like below:
"
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 3152) > this->size() (which is 0)
run.sh: line 5: 26095 Aborted quickmerge -d out.rq.delta -q secondary_contigs.fasta -r pb.only.fasta -hco 5.0 -c 1.5 -l n
"
I know that you already have an explanation for this issue: probably because of the mis-formated fasta header line with white space.
However, I checked my two fasta files and there are no white space in the header line. Thus I guess the mis-formated fasta file is not my case.
I attached what my files looks like:
less pb.only.fasta
000000F
CACCTCGTCGGGGAAGGAGATAGCTTCCTCACGCCAT
less hy.asm.fasta
scf7180000332062
GAGGAGACACCGTGCTACTAGGTGGTTGTGCCACCGGAGCAGCCACACCCTTTAACAGGT
Looking forward to your suggestion!
Thanks a lot!
Dear author;
while I use the quickmerge software ,it appears some errors that confused me a while .here is the error :
4: FINISHING DATA
Traceback (most recent call last):
File "/public1/home/testuser/genome_Aessmblysoftware/quickmerge-master/merge_wrapper.py", line 174, in
subprocess.call(mergercall)
File "/public1/home/testuser/miniconda2/lib/python2.7/subprocess.py", line 168, in call
return Popen(*popenargs, **kwargs).wait()
File "/public1/home/testuser/miniconda2/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/public1/home/testuser/miniconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
while my command is merge_wrapper.py p_ctg.fa SUNSET.contigs.fasta >merge.fasta
Can someone help me to solve it?
Thanks
Alex
Hi,
When running quickmerge for a large genome i get following error:
...
ctg7180000048769 ctg7180000048768 1 Backbone_3247 -1 ctg7180000048769 1
ctg7180000049599 Backbone_10703 1 ctg7180000049599 -1
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 18446744073709551359) > this->size() (which is 118601)
Aborted
The number of sequences which get outputted from the quickmerge run is less than the number of lines in the anchor_summary.txt file, so the error must be while merging. Also no fasta file gets created.
When i cut down the .rq.delta file to half of the size, the error disappears and i get normal output (of course missing some valuable merged contigs).
I am using commit 3f950d8.
Shall i send you my files?
Thank you,
michel
trying to merge to fasta files and I keep getting this error!
Hello!
On some of my genome fastas I get this error
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 764951) > this->size() (which is 755524)
-bash: line 166: 10789 Aborted (core dumped)
My sequences are renamed with numbers like >1, >2 ... >n in both genome fastas. I checked for special characters in sequences by grepping:
' ' - space: none in the file
'\t' - tabs: non in the file
'[^ACTGN]' - anything non nucleotide: i only get the headers.
So my fastas contain '>', ints, A, C, T, N and $.
Considering quickmerge works perfectly with my other datasets, I am pretty sure the issue is my fasta files. I am also careful to use the same genome as query in both nucmer and quickmerge.
Would you have any suggestion for other possible errors in my fasta files or other reasons why this error would appear?
Thanks,
Alex
edit: I am trying with the wrapper. Saw similar issue below.
After struggling seeing other people's paths in MUMmer files (/Home/mmoser/... etc.), I realized I needed to do a 'make clean' in the MUMmer directory. Should that be added to the make_merger.sh script?
Hello,
Thanks for the great tool.
I was curious -- since you recommend finisherSC, I was wondering if you have done any evaluations on the results? Do you have a feel for whether it introduces mis-assemblies and at what rate? etc.
I have read the finisherSC paper -- but I am curious about independent feedback to see if I should include it as part of a de novo assembly I am doing...
Best,
John
Hello,
I am trying to create an extremely contiguous (chromosome-size) genome for a Drosophila, using the pipeline proposed in your 2016 paper. It has done very well so far, I have 3 chromosomes almost fully resolved, but the 4th one is in 3 fragments. I am wondering if there would be a way to stitch together those 3 fragments, for instance, by rerunning quickmerge on the final assembly (or just the 3 fragments extracted from that assembly) and one of the two assemblies (likely the PB only assembly, I guess those gaps are present in the hybrid assembly anyway), but with less stringent parameters. Do you have experience and/or advices for such a procedure ? Would that decrease the quality of the assembly in other parts than the ends of the fragments that need to be stitched together ? How to chose the parameters in this context (or just play around and compare the output ?) ?
Alternatively, I was thinking about using the raw reads again on those specific fragment ends and try to extend them through a consensus-calling method, would you have some tools/programs in mind that would do that ?
Thank you very much !
Best regards,
Coline
Hi,
While trying to decide which order I should merge assemblies, I realized there is something happening with quickmerge which I don't understand.
As you recommended on the wiki, I used the "best" assembly as the query. (best was based just on n50 and busco content). After merging, the number of missing busco genes goes up.
merge_wrapper.py query.fasta ref.fasta -l 90000 -lm 5000
query:
C:94.5%[S:93.1%,D:1.4%],F:3.3%,M:2.2%,n:1658
ref:
C:92.6%[S:91.9%,D:0.7%],F:2.2%,M:5.2%,n:1658
quick_merged:
C:94.0%[S:90.4%,D:3.6%],F:1.4%,M:4.6%,n:1658
I looked in the full output table from busco to get a handle on what is happening. There are indeed genes that are complete in the query assembly but are absent in the merged assembly. Basically all transitions between complete, duplicated, fragmented and missing are occurring during the merging process.
From your publication and your explanation in issue #22, it seems that it should be impossible to lose genes from the query if the reference sequences are only used in gaps to stitch together query contigs, and unaligned query contigs make it to the merged.fasta. Am I missing something?
Thanks,
Earl
How to fix following error ?
python merge_wrapper.py Hdata/extended_10K.fa Hdata/final.genome.scf.fasta
Error: Multiple query file is only supported with the SAM output format
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, out.delta
error no: 400
Traceback (most recent call last):
File "merge_wrapper.py", line 174, in
subprocess.call(mergercall)
File "/home/urbe/anaconda3/lib/python3.6/subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "/home/urbe/anaconda3/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/home/urbe/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'quickmerge': 'quickmerge'
After fixing quickmerge executable
Following error occurs:
python merge_wrapper.py Hdata/extended_10K.fa Hdata/final.genome.scf.fasta
Error: Multiple query file is only supported with the SAM output format
Usage: nucmer [options] ref:path qry:path+
Use --help for more information
ERROR: Could not parse delta file, out.delta
error no: 400
I run this command:/psd/biosoft/quickmerge/merge_wrapper.py -pre test_canu -hco 5.0 -c 1.5 -l 310000 -lm 7000 DBG2OLC.fasta canu.sub.fasta
while I got an error about the maximum reference:
###############################################
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
/psd/biosoft/MUMmer3.23/mummer: suffix tree construction failed: textlen=1107738543 larger than maximal textlen=536870908
ERROR: mummer and/or mgaps returned non-zero
ERROR: Could not parse delta file, test_canu.delta
error no: 400
Traceback (most recent call last):
File "/psd/biosoft/quickmerge/merge_wrapper.py", line 174, in
subprocess.call(mergercall)
File "/usr/local/Python-2.7.9/lib/python2.7/subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/local/Python-2.7.9/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/local/Python-2.7.9/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
###############################################
How can I solve this error? Increasing the maximum reference length? And how to do?
Thank you for your attention.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.