malonge / ragtag Goto Github PK

View Code? Open in Web Editor NEW

447.0 8.0 44.0 285 KB

Tools for fast and flexible genome assembly scaffolding and improvement

License: MIT License

Python 95.21% Shell 4.79%

genome-assembly scaffolding gap-filling

ragtag's People

Contributors

Stargazers

Watchers

Forkers

kdbrumfield wangdi2014 mkyriak lucventurini anorris8 peirong777 alexpersa7 vikash84 wook2014 glichtenstein tclin422 zzsunday computational-genomics-lab lyl8086 yuzhenpeng verofumero mathavan10 davebx wolongac chrgu jgarte bennuru martinjvickers ahulsekemp neato-nick ohudson1 asadduzzamanasad shankarkshakya sajjadasaf gaworj ktonosaki ruixiangliu jokendo-collab thomasvangurp mitsuhikop arturoarciniega ruth-moraa tzhang-nmdp omarsden diegomics twrightsman randolium wcurrey germant13

ragtag's Issues

RagTag not recognizing minimap2 executable?

Hello,

I am trying to run RagTag on a desktop with Ububtu 18.04. I would like to use Minimap2 as an aligner, and have installed Minimap2 in my home directory. When I run the following RagTag command:

ragtag.py scaffold ~/Desktop/M_Per_G006/MPER_G006.fa ~/Desktop/canu-wtdbg2/canu-wtdbg2-consensus-2-clean.fasta \
-o ~/ragtag/M-per -u --aligner '~/minimap2/minimap2' --mm2-params '-x map-pb'

I get the following error:

ValueError: The provided aligner executable is not valid: ~/minimap2/minimap2

I am fairly certain minimap2 is correctly installed, because I can call the executable using ~/minimap2/minimap2 -h

I'll include the entire output here as well, for reference:

Thu Jul 23 14:36:57 2020 --- RagTag v1.0.0
Thu Jul 23 14:36:57 2020 --- CMD: /usr/local/bin/ragtag_scaffold.py /home/yonathan/Desktop/M_Per_G006/MPER_G006.fa /home/yonathan/Desktop/canu-wtdbg2/canu-wtdbg2-consensus-2-clean.fasta -o /home/yonathan/ragtag/M-per -u --aligner $HOME/minimap2/minimap2 --mm2-params -x map-pb
Thu Jul 23 14:36:57 2020 --- Mapping the query genome to the reference genome
Traceback (most recent call last):
  File "/usr/local/bin/ragtag_scaffold.py", line 4, in <module>
    __import__('pkg_resources').run_script('RagTag==1.0.0', 'ragtag_scaffold.py')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/RagTag-1.0.0-py3.6.egg/EGG-INFO/scripts/ragtag_scaffold.py", line 528, in <module>
  File "/usr/local/lib/python3.6/dist-packages/RagTag-1.0.0-py3.6.egg/EGG-INFO/scripts/ragtag_scaffold.py", line 377, in main
  File "/usr/local/lib/python3.6/dist-packages/RagTag-1.0.0-py3.6.egg/ragtag_utilities/Aligner.py", line 126, in run_aligner
  File "/usr/local/lib/python3.6/dist-packages/RagTag-1.0.0-py3.6.egg/ragtag_utilities/Aligner.py", line 116, in exec_is_valid
ValueError: The provided aligner executable is not valid: $HOME/minimap2/minimap2

Thanks!
Y.

remove the redundancy sequence and understand location_confidence

Hi @malonge
I am here. Here is the test data for rice, in two folders, one represents the result of using purge dups to remove the redundancy and the other is not to remove the redundancy sequence.They all run according to the "ragtag scaffold" default parameters. It does seem that Ragtag didn't filter out a lot of redundant sequences, but Purge Dups made it look much better.
I would like to take this opportunity to ask you about it. In the above results of Purge Dups, I noticed that the location_confidence of some small contig was lower than 0.01. What happened? In fact, I consulted you about ragtag.confidence.txt last time, and you kindly demonstrated to me with an example how to obtain orientation confidence. However, I may be confused about the specific calculation rules of location_confidence at this time, so... Sorry to trouble you again.

Thanks
HuangChao
test_data.zip

Using a two contig reference as the main ref.fa and unicycler output as the query

Hi Mike,

I am running correction on a flye assembly that has 2 contigs with a unicycler that has 125. This gives a corrected.fasta with 12 contigs. When I scaffold it I get this:

1_RagTag
11_RagTag
2_RagTag
3_RagTag
39_RagTag
4_RagTag
5_RagTag
6_RagTag
7_RagTag
84_RagTag

Do you think it's better to use a single refseq instead of an segmented one?
My genome is ~ 6,800,000bp

Thank you,
Grace

Skipping alignment through available paf file

Hello,

I'm using RagTag to map an assembly to a chromosome-level assembly and everything worked perfectly with both minimap2 and nucmer. Great tool!

Since it's the first time I'm doing this kind of mapping and since the results are quite different between minimap2 and nucmer, I'm also using lastal for the same task to check the consistency between all mappings. I was able to generate a paf file for lastal and I was wondering if there was a way (and if it was possible) to run RagTag without mapping, given that a paf file is provided.

Thank you in advance for your answer!

Requiring specific mapping segment lengths

Hi @malonge ,

I am wondering if there is a way to require a minimum mapping segment length of 10kb for scaffolds shorter than 1 Mbp, and to require a mapping segment length of at least 50kb for scaffolds longer than 1 Mbp.

Is there also a way I can ensure that half of the alignment segments per scaffold align to the same chromosomes (I know that I can interpret the confidence score to enforce the requirement that more than half of the total scaffold length aligns to the same chromosome).

Also, is there a way to import a mashmap outfile and use that instead of what minimap2 produces? I would like to compare the results of each if possible.

Thanks for your help!

Unconventional Use of RagTag

Originally posted on RAGOO's github on 11/25/2020 (my bad!)

Hi there,

Recently I ran RagTag on a scaffolded assembly (created with long read sequences and HiC data) from a plant allopolyploid - both suspected progenitors are extant. 1 progenitor has a published genome, which was created from short-reads and scaffolding of a related species. It's quite possible that my scaffolded assembly is actually more contiguous than the reference of this progenitor, but that's aside the point of the question I am interested in. The other suspected progenitor does not have a published genome.

Simply put, since the allopolyploidization event is recent (estimated to be much less than <1 mya), and there is marker evidence that many regions of the genome segregate as a diploid (some don't -- this species is a segmental allotetraploid!), I was interested in using RagTag to estimate the subgenome groupings of the scaffolds. I figured that if the grouping_confidence scores between two scaffolds assigned to the same progenitor reference chromosome were substantially different, the higher scoring one is likely derived from that progenitor. By default, the other scaffold is assigned to the other progenitor.

I'm sure you can think of a number of flaws with this approach -- but the main one I am struggling with at the moment is I don't have a great sense of how to tell when the difference between 2 grouping_confidence scores is substantial enough to assign the scaffolds confidently to a subgenome. I suppose I could do a t-test of the differences of the 8 groupings and see which are significant... but being the creator of RagTag, I was interested in what you thought of this approach?

Ideally I would be doing a Ks comparison between the progenitor and my scaffolds, but my assembly is not yet annotated, so the coding regions haven't been picked out quite yet. I was hoping to label the scaffolds before doing so, but maybe I should just suck it up and name them later! I also thought about using polyCRACKER, but I'm not familiar with docker whatsoever and the thing seemed like it would be a pain in the ass to get running.

Attached is a file containing my confidence scores. Any advice is much appreciated!

Kindly,
Charity
ragtag.confidence1_16B.xlsx

Probable typo for scaffold err file

There might be a typo here, where the log file is ragtag.scaffold.err. All the other files are of the form ragtag.scaffolds. Only an aesthetic issue, but gets in the way of tab completion.

Is there a way to run this as a batch?

Hi there,

When I attempt to write a small loop to run these one after the other, it will output to the same file and throw a number of errors. Is there a way to run the two scripts (correct and scaffold) on many samples at all?

Thanks!

Chimeric contigs still present?

Following on from another issue (Issue #29 ), I thought I would add this here.

I'm finding that ragtag is not able to resolve chimeric contigs even after doing the ragtag correct protocol. You will see in the attached image different versions of a genome's chromosome AFTER scaffolding.

The pertinent rows are the 3rd, 4th and 5th rows. The 3rd row is the RaGOO only process, the 4th RagTag only and 5th is RagTag then RaGOO process.

As you can see RaGOO was able to resolve the chromosome somewhat while RagTag could not. Admittedly, this particular chromosome is weirdly tricky for unknown reasons but I found that in other chromosomes that the combination of both provided the best consensus scaffolding. Although the downside is that the use of RaGOO resulted in loss of some of what I'm assuming to be telomeric regions which RagTag often seems to get.

Commands used:
RaGOO: ragoo.py isolate_masked.fasta reference.fasta.gz -t 24 -b
RagTag: ragtag correct reference.fasta isolate_masked.fasta -t 24
ragtag scaffold reference.fasta isolated_masked.corrected.fasta -t 24
Combined: same commands as above but ran RagTag first then RaGOO on the output of RagTag

criteria for breaking contigs

Hello,

Thank you for developing this software - very easy to install and run.

I am using it to place contigs of a ~230 Mb plant assembly (28x HiFi data, assembled with hifiasm, I have mostly p-contigs) using a reference. To make sure I don't have misassemblies I run the correct module with the same reads I had for the assembly.
The command was
ragtag.py correct -t 45 --inter -j ctg_below_100kb_list -o HalH2_ragtag_corr2 -R readsQ20.fa -T corr reference.fa assembly.fa
and the new fasta has 45 more contigs - OK with me.

I went looking at the regions where the cuts are made and I don't see big issues: a good coverage and uniform distribution of reads across the site. There is only one case (first slide) where I agree that the region is questionable (though at a dotplot the region is very simple and the structure is confirmed by collinearity with the reference), but all the other cases seem like any other region to me.
What are the criteria that are applied to insert those cuts?
Thanks,
Dario

RagTag_cuts_HalH2.pdf

ragtag correct stop with minimap2 error message

Hello,

many thanks for developing this tool (it's incredibly easy to install).

I encountered an issue when trying to correct an Illumina assembly called 'C1_k105_scaffolds.fasta' using forward Illumina reads (from a paired-end library; file 'C1_1_trim.fq.gz') and a reference assembly of a closely related species (file '1-Genome_assembly.fa'), with ragtag exiting with an error message.

The ragtag log is:

Tue Aug 11 12:18:36 2020 --- RagTag v1.0.0
Tue Aug 11 12:18:36 2020 --- CMD: /xxx/anaconda3/bin/ragtag_correct.py 1-Genome_assembly.fa C1_k105_scaffolds.fasta -t 2 -T sr -R /xxx/sp/01_cleaned/C1_1_trim.fq.gz -u -o C1_correct --gff 2-Genome_GFF3.gff3.txt
Tue Aug 11 12:18:36 2020 --- Mapping the query genome to the reference genome
Tue Aug 11 12:18:36 2020 --- Running: minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf.log
Tue Aug 11 12:18:45 2020 --- Finished running : minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf.log
Tue Aug 11 12:18:45 2020 --- Reading whole genome alignments
Tue Aug 11 12:18:46 2020 --- Filtering and merging alignments
Tue Aug 11 12:18:47 2020 --- Validating putative query breakpoints via read alignment.
Tue Aug 11 12:18:47 2020 --- Aligning reads to query sequences.
Tue Aug 11 12:18:47 2020 --- Running: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log
Traceback (most recent call last):
File "/xxx/anaconda3/bin/ragtag_correct.py", line 645, in
main()
File "/xxx/anaconda3/bin/ragtag_correct.py", line 591, in main
al.run_aligner()
File "/xxx/anaconda3/lib/python3.6/site-packages/ragtag_utilities/Aligner.py", line 128, in run_aligner
run_oe(self.compile_command(), self.out_file, self.out_log)
File "/xxx/anaconda3/lib/python3.6/site-packages/ragtag_utilities/utilities.py", line 73, in run_oe
raise RuntimeError('Failed : %s > %s 2> %s' % (" ".join(cmd), out, err))
RuntimeError: Failed : minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log

and the sam log (file 'c_reads_against_query.sam.log') is:

[M::mm_idx_gen::3.1521.30] collected minimizers
[M::mm_idx_gen::3.8311.42] sorted minimizers
[M::main::3.8381.42] loaded/built the index for 27604 target sequence(s)
[M::mm_mapopt_update::3.8381.42] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 27604
[M::mm_idx_stat::4.034*1.40] distinct minimizers: 16518461 (95.55% are singletons); average occurrences: 1.091; average spacing: 6.034

Would you have any idea where the error comes from?
My apologises if this issue is trivial, I do not have much experience with ragtag or minimap2.

many thanks
Romain

Add support for unimap

Add unimap as a assembly-to-assembly aligner option (with --aligner). #47

Errors can fail silently

Hi,
I've noticed a few times that, at least on cluster submitted jobs, errors during the scaffolding process don't seem to cause RagTag to fail. Specifically, I am using RagTag as a step in a snakemake workflow, and everything appears to have gone smoothly and continue, when in reality there was an error.

For example, a stripped down log is below

Fri Oct 23 20:31:14 2020 --- Writing: 
Fri Oct 23 20:31:14 2020 --- Retaining pre-existing file: 
Fri Oct 23 20:31:14 2020 --- Running: ragtag_agp2fasta.py 
[E::fai_retrieve] Failed to retrieve block: unexpected end of file
Traceback (most recent call last):
...
 File "pysam/libcfaidx.pyx", line 319, in pysam.libcfaidx.FastaFile.fetch
ValueError: failure when retrieving sequence on 'tig00018983'
Traceback (most recent call last):
...
RuntimeError: Failed : ragtag_agp2fasta.py

I'm not sure why the RuntimeError isn't signalling correctly that there has been a failure, maybe something to do with how run_o redirects stdout? However, since the file is progressively generated, there is an output file (just very incomplete) which snakemake believes is correct and moves on.

I believe this specific issue is due re-using the .agp file with a modified input, which wouldn't occur if I used -w, so that doesn't seem troublesome in general.

Thanks,
Alex

Error when running ragtag scaffold command

Hi! I have attached the initial ref.fa and contigs fasta files I am using. Please let me know if there's any other files you need!

When running ragtag correct, I received no error, however when I then ran ragtag scaffold I received the following error:

[denovo_files.zip](https://github.com/malonge/RagTag/files/5294254/denovo_files.zip)

[jvelez@acf-login5 gwas_analyses]$ ragtag.py scaffold -u ref2.fa ragtag_output/denovo_contigs.corrected.fasta
Mon Sep 28 14:14:36 2020 --- RagTag v1.0.1
Mon Sep 28 14:14:36 2020 --- CMD: /nics/b/home/jvelez/miniconda2/bin/ragtag_scaffold.py -u ref2.fa ragtag_output/denovo_contigs.corrected.fasta
Mon Sep 28 14:14:36 2020 --- Mapping the query genome to the reference genome
Mon Sep 28 14:14:36 2020 --- Retaining pre-existing file: /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/query_against_ref.paf
Mon Sep 28 14:14:36 2020 --- Reading whole genome alignments
Mon Sep 28 14:14:36 2020 --- Filtering and merging alignments
Mon Sep 28 14:14:36 2020 --- Ordering and orienting query sequences
Mon Sep 28 14:14:36 2020 --- Writing scaffolds
Mon Sep 28 14:14:36 2020 --- Writing: /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.agp
Mon Sep 28 14:14:36 2020 --- Retaining pre-existing file: /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.agp
Mon Sep 28 14:14:36 2020 --- Running: ragtag_agp2fasta.py /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.agp /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/denovo_contigs.corrected.fasta > /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.fasta
Traceback (most recent call last):
  File "/nics/b/home/jvelez/miniconda2/bin/ragtag_agp2fasta.py", line 74, in <module>
    main()
  File "/nics/b/home/jvelez/miniconda2/bin/ragtag_agp2fasta.py", line 67, in main
    sys.stdout.write(fai.fetch(agp_line.comp))
TypeError: write() argument must be str, not bytes
Traceback (most recent call last):
  File "/nics/b/home/jvelez/miniconda2/bin/ragtag_scaffold.py", line 528, in <module>
    main()
  File "/nics/b/home/jvelez/miniconda2/bin/ragtag_scaffold.py", line 514, in main
    run_o(cmd, output_path + "ragtag.scaffolds.fasta")
  File "/nics/b/home/jvelez/miniconda2/lib/python3.6/site-packages/ragtag_utilities/utilities.py", line 91, in run_o
    raise RuntimeError('Failed : %s > %s' % (" ".join(cmd), out))
RuntimeError: Failed : ragtag_agp2fasta.py /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.agp /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/denovo_contigs.corrected.fasta > /lustre/haven/user/jvelez/gwas_analyses/ragtag_output/ragtag.scaffolds.fasta

multithreaded or single thread?

Hi,

I have a quick question - does RagTag support multiple threads to speed up the processing? or is it single-threaded?
If it does have support for multi-threaded ops - what is the flag to enable it? same as Ragoo?

Thanks!

Scaffolding proceeds even with empty paf file

Hi,
This is 99% user error, so the observed behaviour may be "working as intended".

When there is an error in the alignment phase, this doesn't seem to get picked up any where during scaffolding and all the output exists (but is identical to the input). In my case, it was because of a permission error on the fasta file, resulting in a size 0 paf.

Skimming the code in ragtag_scaffold.py, would it make sense to add a check around L396 (ctg_alns = read_genome_alignments(...)) that there are >0 alignments? Without any alignments here, the output is trivially going to match the input, and so isn't really "scaffolding" anything.

thanks,
Alex

KeyError: "sequence 'ctg002980_np1212' not present"

Hi,

When I used ragtag.py to do scaffolding, the pipelie died at runing ragtag_agp2fasta.py and reported "can't not find sequence error".
Thu Jul 30 02:21:33 2020 --- Running: ragtag_agp2fasta.py
/genome/g41/ragtag_output/ragtag.scaffolds.agp
/genome/g41/genome.nextpolish.fa > genome/g41/ragtag_output/ragtag.scaffolds.fasta
Traceback (most recent call last):
File "/public/home/miniconda3/envs/ngs/bin/ragtag_agp2fasta.py", line 74, in
main()
File "/public/home/miniconda3/envs/ngs/bin/ragtag_agp2fasta.py", line 67, in main
sys.stdout.write(fai.fetch(agp_line.comp))
File "pysam/libcfaidx.pyx", line 303, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence 'ctg002980_np1212' not present"
Traceback (most recent call last):
File "/public/home/miniconda3/envs/ngs/bin/ragtag_scaffold.py", line 528, in
main()
File "/public/home/miniconda3/envs/ngs/bin/ragtag_scaffold.py", line 516, in main
run_o(cmd, output_path + "ragtag.scaffolds.fasta")
File "/public/home/miniconda3/envs/ngs/lib/python3.6/site-packages/ragtag_utilities/utilities.py", line 91, in run_o
raise RuntimeError('Failed : %s > %s' % (" ".join(cmd), out))
RuntimeError: Failed : ragtag_agp2fasta.py
genome/gems41/ragtag_output/ragtag.scaffolds.agp /genome/g41/genome.nextpolish.fa > /genome/g41/ragtag_ou
tput/ragtag.scaffolds.fasta

Any suggestions?
Thanks

First-timer / Help with putting fastq or sam files into RagTag?

Hello,

I am working on a reference-guided genome assembly project - my team was given Oxford Nanopore sequence reads that I merged into a single .fastq file and put through Minimap2 to get a single .sam file. My goal is to generate a consensus sequence in .fasta format that I can use for immediate sequence inspection and later annotation. However, I'm not sure where to go from here since I am totally new to coding and need guidance in getting started with RagTag.

I understand that I have aligned my patient sample's sequence reads to my reference genome, but it looks like I need this to be in .fasta format... how to I enter the Minimap2 output (.sam) into the RagTag program?

I also saw that Minimap2 is a sub-package of RagTag; maybe I'm confused because I am supposed to do the align the reads against the reference genome via the scaffolding function of RagTag instead of using Minimap2 separately? In that case, can I use my initial merged .fastq file in place of the query .fasta file?

Thank you for the help!

Troubleshooting inaccurate scaffolds

If your scaffolds don't look right, I suggest using Nucmer instead of Minimap2. To do this, use the following parameters:

ragtag.py scaffold --aligner /path/to/nucmer --nucmer-params='--maxmatch -l 100 -c 500' ref.fa query.fa

I suggest reading the MUMmer docs to learn about the --maxmatch, -l and -c parameters. Depending on the assemblies, one may consider increasing the seed and cluster size values to save time and to improve specificity. Removing --maxmatch might also save time without sacrificing accuracy.

I plan to provide a more detailed wiki section on this topic in the future.

Merging of contig overlaps based on reference?

Greetings,

I'm looking to use RagTag to scaffold multi-contigs produced by de novo assemblies. However I found that contigs that had small but valid overlaps based on both the sequences themselves as well as per the reference genome used, were not merged but instead were scaffolded with a 100bp gap in between. Is there a way to merge them if the reference determines they should be so?

I've tested with -r (infer gap sizes), -g 1 (minimum inferred gap size = 1), -d 1 (alignment merge distance = 1) but to no avail.

Logfile is attached, with the query (HPUPM4.megahit.171220.fa) & reference (NC_045512.2_Wuhan-Hu-1_genome.fasta).
Thank you very much in advance.

log.txt
query_and_ref.zip

Can i use mummer4 instead of the mummer3?

Hi sir
Just like the topic, the ragtag chooses the mummer3 by default, can i use mummer4 to replace it？
Thanks a lot.

understand the output file "ragtag.confidence.txt"

Hello, malonge
Sorry to bother you again, but Ragtag is a great tool with the advantages of both speed and precision. My question is to understand the output file "ragtag.confidence.txt", Why do some contig grouping_confidence,location_confidence and orientation_confidence have sizes that are out of sync or even far apart? that is, for example, grouping_confidence=1 but location_confidence or orientation_confidence may be 0.1 or lower. Because I didn't understand how they were calculated, I couldn't understand the difference between Query genome and Ref genome, and it was difficult to implement personalized parameters such as -i,-a, and -s in my project.
Ragtag is a really good software, especially for me who is a junior and not experienced in writing scripts. Thank you and your team again for your hard work on this software.

Sincerely,
huangchao

Is it possible to pass multiple references for scaffolding? I have two reference strain for my contigs

ValueError in get_median_read_coverage

Hi Micheal,
I ran into this error message when running ragtag correct. I supplied with CCS reads with -R option.
Below is the log file.
Date = Fri Aug 14 15:33:22 EDT 2020
Hostname = c6a-s20
Working Directory = /blue/whitaker/fanzhen/hifi_test/RTag

Number of Nodes Allocated      = 1
Number of Tasks Allocated      = 1
Number of Cores/Task Allocated = 16
Fri Aug 14 15:33:26 2020 --- RagTag v1.0.0
Fri Aug 14 15:33:26 2020 --- CMD: /apps/ragtag/1.0.0/bin/ragtag_correct.py ../ragoo/F_ana_Camarosa_6-28-17_hardmasked.fasta ../ragoo/royal.asm.p_ctg.fa -R SRR11606867.fastq -T corr
Fri Aug 14 15:33:26 2020 --- WARNING: Without '-u' invoked, some component/object AGP pairs might share the same ID. Some external programs/databases don't like this. To ensure valid AGP format, use '-u'.
Fri Aug 14 15:33:26 2020 --- Mapping the query genome to the reference genome
Fri Aug 14 15:33:26 2020 --- Running: minimap2 -x asm5 -t 1 /blue/whitaker/fanzhen/hifi_test/ragoo/F_ana_Camarosa_6-28-17_hardmasked.fasta /blue/whitaker/fanzhen/hifi_test/ragoo/royal.asm.p_ctg.fa > /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_query_against_ref.paf 2> /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_query_against_ref.paf.log
Fri Aug 14 16:05:21 2020 --- Finished running : minimap2 -x asm5 -t 1 /blue/whitaker/fanzhen/hifi_test/ragoo/F_ana_Camarosa_6-28-17_hardmasked.fasta /blue/whitaker/fanzhen/hifi_test/ragoo/royal.asm.p_ctg.fa > /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_query_against_ref.paf 2> /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_query_against_ref.paf.log
Fri Aug 14 16:05:21 2020 --- Reading whole genome alignments
Fri Aug 14 16:05:25 2020 --- Filtering and merging alignments
Fri Aug 14 16:05:45 2020 --- Validating putative query breakpoints via read alignment.
Fri Aug 14 16:05:45 2020 --- Aligning reads to query sequences.
Fri Aug 14 16:05:45 2020 --- Running: minimap2 -ax asm5 -t 1 /blue/whitaker/fanzhen/hifi_test/ragoo/royal.asm.p_ctg.fa /blue/whitaker/fanzhen/hifi_test/RTag/SRR11606867.fastq > /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_reads_against_query.sam 2> /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_reads_against_query.sam.log
Fri Aug 14 16:06:52 2020 --- Finished running : minimap2 -ax asm5 -t 1 /blue/whitaker/fanzhen/hifi_test/ragoo/royal.asm.p_ctg.fa /blue/whitaker/fanzhen/hifi_test/RTag/SRR11606867.fastq > /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_reads_against_query.sam 2> /blue/whitaker/fanzhen/hifi_test/RTag/ragtag_output/c_reads_against_query.sam.log
Fri Aug 14 16:06:52 2020 --- Compressing, sorting, and indexing read alignments
Fri Aug 14 16:06:52 2020 --- Indexing read alignments
Fri Aug 14 16:06:53 2020 --- Validating putative query breakpoints
Fri Aug 14 16:06:53 2020 --- Calculating global read coverage
Traceback (most recent call last):
  File "/apps/ragtag/1.0.0/bin/ragtag_correct.py", line 645, in <module>
    main()
  File "/apps/ragtag/1.0.0/bin/ragtag_correct.py", line 610, in main
    ctg_breaks = validate_breaks(ctg_breaks, output_path, num_threads, overwrite_files, val_min_break_end_dist, max_cov, min_cov, window_size=val_window_size, clean_dist=min_break_dist, debug=debug_mode)
  File "/apps/ragtag/1.0.0/bin/ragtag_correct.py", line 168, in validate_breaks
    glob_med = get_median_read_coverage(output_path, num_threads, overwrite_files)
  File "/apps/ragtag/1.0.0/bin/ragtag_correct.py", line 124, in get_median_read_coverage
    raise ValueError()
ValueError

Please let me know how to solve the issue.
Thanks,
Zhen

Unable to get corrected fasta and AGP file after using Pacbio reads

Hello,

I generated assembly with Illumina reads (both paired end and mate pairs) and Pacbio long reads.
I was able to correct and validate query assembly using Illumina reads (using -T sr option in RagTag).

I wanted to use Pacbio reads for correction as well.
Hence used corrected assembly based on Illumina reads as input and used -T corr option.
This generated SAM, BAM file and BAM stats file and log files without any errors.

In BAM stats file, it has listed 248 reads mapped as well.

Still I get following error and it fails to generate AGP file and corrected fasta file
Traceback (most recent call last):
File "RagTag-1.0.1/bin/ragtag_correct.py", line 4, in
import('pkg_resources').run_script('RagTag==1.0.1', 'ragtag_correct.py')
File "Python-3.6.5/lib/python3.6/site-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "Python-3.6.5/lib/python3.6/site-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "RagTag-1.0.1/lib/python/RagTag-1.0.1-py3.6.egg/EGG-INFO/scripts/ragtag_correct.py", line 645, in
File "RagTag-1.0.1/lib/python/RagTag-1.0.1-py3.6.egg/EGG-INFO/scripts/ragtag_correct.py", line 608, in main
File "RagTag-1.0.1/lib/python/RagTag-1.0.1-py3.6.egg/EGG-INFO/scripts/ragtag_correct.py", line 168, in validate_breaks
File "RagTag-1.0.1/lib/python/RagTag-1.0.1-py3.6.egg/EGG-INFO/scripts/ragtag_correct.py", line 124, in get_median_read_coverage
ValueError: Unable to calculate read coverage. Check SAM/BAM files and stats file.

Any help in this regard will be really helpful.

Regards
Ketaki

Unable to update gff

Hi,

I tried to update gff using RagTag v1.0.1 but was unfortunately unsuccessful. I was wondering if there's any tips that I can advance please? I have scaffolded successfully and both agp and gff look okay.

Best,
Jason

Thu Sep 3 14:53:03 2020 --- RagTag v1.0.1 Thu Sep 3 14:53:03 2020 --- CMD: /home/ijt/.local/bin/ragtag_update_gff.py test.gff ragtag_output/ragtag.scaffolds.agp Traceback (most recent call last): File "/home/ijt/.local/bin/ragtag_update_gff.py", line 162, in <module> main() File "/home/ijt/.local/bin/ragtag_update_gff.py", line 156, in main sup_update(gff_file, agp_file) File "/home/ijt/.local/bin/ragtag_update_gff.py", line 114, in sup_update raise ValueError("Inconsistent input files.") ValueError: Inconsistent input files.

inconsistent scaffolds from paf file

Hello,
I am using RagTag to anchor contigs (230 Mb plant, HiFi data) based on an assembly of a closely-related species and I see discrepancies between the results of the WGA and the composition of the scaffolds.
The command was
ragtag.py scaffold -t 45 -j cp_mt_list -o HalH2_ragtag2 reference.fa assembly.fa
and here are the alignments before and after the scaffolding (in the former I removed from the paf file the lines for the very small contigs).
HalH2_RagTag_scaffolds.pdf
Most of the contigs (y axis) map to a single reference chromosome (x), but after ragtag scaffold (second image) the new sequences in y align to different chromosomes (x) and also have discordant orientation (see ragtag scaffolds 1,6,7 for example).

I wonder how to explain this shuffling of sequences when the assignments to a reference seem so "obvious" from the raw alignment file.
I ended up making the pseudos myself - did you notice similar behavior in other cases? I can share the alignment file if needed.
Thanks,
Dario

From RaGOO to RagTag

Hi Michael,
I tried to use RaGOO. It was interesting to use it and more or less it gave me what I expected. However, my concatenated contigs (I use the option to not break previous scaffolds), the new pseudomolecule, is larger than I expected based on the contigs lengths and the inserted gaps. Anyone is reporting that issue?
python ragoo.py sample.fasta ref.fasta -g 10000
For that reason, I moved to RagTag, to see if I can get pseudomolecules with the size I expected. However, I am getting an error
Traceback (most recent call last): File "/software/programs/RagTag/ragtag.py", line 30, in <module> from ragtag_utilities.utilities import get_ragtag_version File "/software/programs/RagTag/ragtag_utilities/utilities.py", line 37, in <module> complements = str.maketrans("ACGTNURYSWKMBVDHacgtnuryswkmbvdh", "TGCANAYRSWMKVBHDtgcanayrswmkvbhd") AttributeError: type object 'str' has no attribute 'maketrans'
Thanks,
Peris

why the size of result is much bigger than the reference genome?

hello,

I use the ragtag to do scaffold, my reference genome is 43M, but the size of result is much bigger than the reference genome(the asssembled genome was 367M)

Here was my command:
ragtag.py scaffold -r -u /public/agis/shaohaojing_group/sunda/rice-shanglianguang/reference_by_hzaurice/ZS97_RS3_chr1.fa
/public/agis/shaohaojing_group/sunda/rice-shanglianguang/ragtag_output/ZC206.contig.corrected.fasta
--aligner /public/agis/panweihua_group/panweihua/tools/minimap2/minimap2
--mm2-params '-x map-ont'
-o /public/agis/shaohaojing_group/sunda/rice-shanglianguang/ragtag_output

ragtag_break_query.py error

I am getting the below error when running ragtag_correct.py

Sat Jun 13 18:03:36 2020 --- Running: ragtag_break_query.py /home/jon/Working_Files/RagTag/ragtag_output/ragtag.correction.agp /home/jon/Working_Files/Patanus-Allee/assembly/out_contig.fa > /home/jon/Working_Files/RagTag/ragtag_output/out_contig.corrected.fasta
Traceback (most recent call last):
  File "/home/jon/anaconda3/envs/biotools/bin/ragtag_break_query.py", line 4, in <module>
    __import__('pkg_resources').run_script('RagTag==1.0.0', 'ragtag_break_query.py')
  File "/home/jon/anaconda3/envs/biotools/lib/python3.6/site-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/jon/anaconda3/envs/biotools/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1455, in run_script
    .format(**locals()),
pkg_resources.ResolutionError: Script 'scripts/ragtag_break_query.py' not found in metadata at '/home/jon/Working_Files/RagTag/RagTag.egg-info'
Traceback (most recent call last):
  File "ragtag_correct.py", line 645, in <module>
    main()
  File "ragtag_correct.py", line 641, in main
    run_o(cmd, output_path + qf_pref + ".corrected.fasta")
  File "/home/jon/Working_Files/RagTag/ragtag_utilities/utilities.py", line 91, in run_o
    raise RuntimeError('Failed : %s > %s' % (" ".join(cmd), out))
RuntimeError: Failed : ragtag_break_query.py /home/jon/Working_Files/RagTag/ragtag_output/ragtag.correction.agp /home/jon/Working_Files/Patanus-Allee/assembly/out_contig.fa > /home/jon/Working_Files/RagTag/ragtag_output/out_contig.corrected.fasta

Can this be used to construct exome assembly

Hello,

I have a unique situation where we have reads from WES and interested in only a subset of genes. I was wondering if RagTag would be appropriate to assemble contigs for the small set of genes.

Best,

Parameter suggestions for fixing misplaced scaffold

Hi,

I've used RagTag to scaffold my assembly against a decent reference and the results in general look good. However there is one pseudochromosome that appears to have been scaffolded incorrectly as it has a telomeric region placed about 500,000 bp into the scaffold, followed by a gap. Would you have any suggestions for parameters that could be tweaked to fix this?

I'm not 100% certain that it's actually a missassembly, but the results look dodgy so I thought I'd ask for help!

chromosome alignment stats?

Hello,
Is there a way to see statistics about the scaffolds of my species' draft assembly mapping to the reference assembly? For example, can we see somewhere in the output what the number of scaffolds and total length in bp mapped to each of the reference chromosomes (and the unplaced scaffolds in chromosome 0)?
I ask because I'm trying to determine which reference assembly I should use to give me the best pseudochromosome-level assembly from my species' draft scaffold assembly. I am debating between about 4 potential reference genomes and would like some metrics to compare.
A chromosome assembly of my species (which is tetraploid and probably quite repetitive) would be expected to be 14 chromosomes and around 1,500Mbp in length.
I have one reference genome of a more closely-related species (in the same same Order) but it is 16 chromosomes and 400Mbp in length.
I have two other reference genomes of less closely related species that either have the 13 or 15 chromosomes but are also smaller in length.
I have another reference genome even less closely related with 14 chromosomes but much larger in length.
So I'm wondering how I can see the statistics mentioned above in order to make the best choice for a reference genome?
On a similar vein, I am also tweaking parameters to try to improve the pseudochromosome assembly. I saw that the parameters for minimap can be changed based on sequence divergence, so I've been using "-x asm20" to be on the safe side. I've also decreased the scaffold length -f to 500bp. Other than that and the choice of reference genome, do you have an inkling of what other parameters would produce the most realistic pseudochromosome-level assembly for my purposes?
Thank you for any insight you can offer!
--Jenny

A discussion of correction module.

hello

As it turned out, I was keen to understand what correction module would do to raw data, and therefore decided whether correct's steps were needed for my projects.
Notwithstanding the introduction's declaration that "In all cases, sequence is never added or subtracted. Query sequences are only broken at points of putative misassembly." I tested my data to find that the operation seemed to diminish the structural variation between Query and Reference. Specifically, if a large structural variation existed between the two genome, it would, after correct, diminish its difference, and even rectify Query to be the same as Reference. This introduces a great deal of bias in reference-guide assembly.
My data were generated by pacbio-Hifi sequencing, and contigs was assembled using Canu, and the above conclusions were obtained from the test results of chromosome 6 in indica and japonica rice.
Looking forward to your reply.

Regards
huangchao

Paramater suggestions?

Hi. I used Ragtag for the first time today and it seemed to work nicely. I have a small 200MB (ish) genome that should be structurally very close to a high quality reference assembly I'm correcting relative to.

There are two evident inversions that long read data suggests are artifacts. Ragtag corrected the peritelomeric one with just the vanilla command-line, but the more centromeric-looking inversion is still there after running the commands below. I wonder if anyone could suggest a tweak to the command line that might get both corrected? Currently I'm just using:

ragtag.py correct 2017.fa 2020.fa -o correct_genome_only/ -t 12
ragtag.py scaffold 2017.fa ./correct_genome_only/2020.corrected.fasta  -o corrected_scaffolded_genome -t 12

Attaching a synmap of before/after, although I'd note that scaffold order is slightly rearranged between the two.

Ragtag correct error

hi
thank you for your soft. there is a error when run
ragtag.py correct -u -R mr.41.15.15.0.02.1.fa -T corr -t 15 ref.fasta B2086.fasta .
in the log file ，
Sat Aug 15 19:54:17 2020 --- Compressing, sorting, and indexing read alignments
[E::sam_parse1] query name too long
[E::sam_parse1] query name too long
[E::sam_parse1] query name too long
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=None, stderr=[main_samview] truncated file.\nsamtools view: error closing "/home/01_program/02_reference_guided_scaffolding/ragtag_output/c_reads_against_query.sam": -5\n'

so,do any lenth limits in the query name of reads ?thanks

Required output files are not created for ragtag.py correct

Hi Michael,

I got "c_query_against_ref.paf" file as an output for the following command:
ragtag.py correct --debug -t 16 --aligner -u -o <out_dir> <HiCanu_assembly_dir> <reference_assembly_dir>
Can you please suggest what changes needs to be done in this command to get {query_prefix}.corrected.fasta as output?

Also, the output was not saved in the out_dir. Instead was saved in ragtag installation folder. Can you please suggest how to overcome this?

Next, I want to try the same command with raw read data for validation. I will add -R, --read-aligner and -T options to the above command and since my data is HiFi, I plan to provide the raw reads that came out of sequencer as it is without correction. Hope that's right. Or do you suggest different options for HiFi data?

Also, want to note that ref_asm is haploid and HiCanu_asm is diploid. Hope that won't cause a big difference in this analysis. I will redo it with HiCanu_asm as haploid once I have it.

Thank you,
Minal

Higher confidence score with Ragtag than Ragoo

Hi Micheal,
Did you use different formulas to calculate confidence scores with Ragtag from Ragoo?
In general, I observed much higher scores with Ragtag using the same data.
Thanks,
Zhen

Something wrong in the writing agp file

Hi sir
i meet some error at the writing agp file stage.
here is the log:

Tue Aug 25 09:51:59 2020 --- RagTag v1.0.1
Tue Aug 25 09:51:59 2020 --- CMD: ragtag_scaffold.py chr.fa SY1032.Scf.fasta -o res --aligner /mummer4/bin/nucmer --nucmer-params -l 20 --threads=16 -u
Tue Aug 25 09:51:59 2020 --- Mapping the query genome to the reference genome
Tue Aug 25 09:51:59 2020 --- Retaining pre-existing file: query_against_ref.delta
Tue Aug 25 09:51:59 2020 --- Running: ragtag_delta2paf.py res/query_against_ref.delta > res/query_against_ref.paf
Tue Aug 25 10:13:54 2020 --- Finished running : ragtag_delta2paf.py res/query_against_ref.delta > res/query_against_ref.paf
Tue Aug 25 10:13:54 2020 --- Reading whole genome alignments
Tue Aug 25 10:16:12 2020 --- Filtering and merging alignments
Tue Aug 25 10:18:19 2020 --- Ordering and orienting query sequences
Tue Aug 25 10:18:19 2020 --- Writing scaffolds
Tue Aug 25 10:18:19 2020 --- Writing: res/ragtag.scaffolds.agp
Traceback (most recent call last):
File "/anaconda3/bin/ragtag_scaffold.py", line 528, in
main()
File "/anaconda3/bin/ragtag_scaffold.py", line 506, in main
write_orderings(output_path + "ragtag.scaffolds.agp", output_path + "ragtag.confidence.txt", query_file, mapped_ref_seqs, fltrd_ctg_alns, pad_sizes, gap_types, make_chr0, overwrite_files, not remove_suffix)
File "/anaconda3/bin/ragtag_scaffold.py", line 192, in write_orderings
agp.add_seq_line(*out_agp_line)
File "/anaconda3/lib/python3.7/site-packages/ragtag_utilities/AGPFile.py", line 170, in add_seq_line
self._raise_line_err(line_number, "object identifier out of order")
File /anaconda3/lib/python3.7/site-packages/ragtag_utilities/AGPFile.py", line 71, in _raise_line_err
raise ValueError(message)
ValueError: line 133: object identifier out of order

Do you have any idea to solve this?
Thanks a lot.

Gaps sequences?

Hello Mike:
I am comparing the statistics from the use of ragtag on a draft assembly to several possible reference assemblies. (Actually the draft assembly is a bit different for each reference, because the "correct" script is used before each reference "scaffold" script.) What I am wondering is what the "gap_bp" and "gap_sequences" actually mean? I used the default of 100 for gap lengths, rather than inferring them. Are the gap sequences and length also part of the unplaced sequences and length, or some third category of sequences? Below are my results from running ragtag with six different references. (It is looking like the first one is "best" based on overall placed length, but it also has the largest gap length...)

placed_sequences	placed_bp	unplaced_sequences	unplaced_bp	gap_bp	gap_sequences
4,764	601,032,099	728,635	511,410,804	474,800	4,748
4,160	584,072,430	729,360	528,370,473	413,200	4,132
651	447,227,915	732,721	665,214,988	62,800	628
2,047	489,225,429	731,480	623,217,474	196,300	1,963
2,091	535,970,451	731,502	576,472,452	207,800	2,078
917	467,971,426	732,416	644,471,477	90,300	903

Thanks for your help and insight!
Jenny

A conflict occured when installing ragtag.

hello,
when I used the following command:
conda install -c bioconda ragtag
a problem occured like this

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

feature:/linux-64::__cuda==10.1=0
feature:|@/linux-64::__cuda==10.1=0

Your installed version is: 10.1
how can I solve the conflict??
thank you very much.

Omni-C and Hi-C

Thanks a lot for your nice tool. I used it for scaffolding HiFi data using a Dovetail chromosome level assembly and it worked very well.

Could you please provide a step by step description or more details for Hi-C scaffolding?

It is not clear how the .agp files must be prepared for the Hi-C scaffolding.

I mean in: ragtag.py merge -b hic.bam query.fasta out_*/*.agp
How the files in out_*/*.agp must be prepared!

Can we scaffold using Omni-C data instead of Hi-C?

Regards
Ardy

the number of scaffolds generated by ragtag is not ideal

Hello,
Thanks for your application, ragtag. I used Ragtag software to mount my assembled genome, but the reference genome had 16 chromosomes, and the genome I generated could only be mounted to 15. How can I debug to get the correct number of genomes. Looking forward to your reply.
Rilla

How to get output with unplaced contigs not concatenated?

Hello,
I'd like to have the output with unplaced contigs returned as individual (original) sequences, not concatenated.
I have tried both with and without the -C option, but either way the unplaced contigs are returned concatenated into one big sequence.
Is that expected?
Thanks for making this great resource available to all of us.
Stacy

Format description of the *.agp file

Hi Malonge,
Could you please share the format or/and header of the *.agp file to me?

Many thanks

How does the RagTag fill the gaps & distinguish the misassemblies & SVs?

Hey there!
I have a couple of questions on the RagTag. For example, we have two chromosome-length genome assemblies of Apis mellifera (Fig.) Amel_HAv3.1(ref) and Amel_INRA_1(query). I want to fill the gap in the Chr 2 (blue circle), and solve the misassemblies in Chr 6 (red circle). I run the following command: ragtag.py correct ref.fasta query.fasta
Questions :

Whether the RagTag will fill the gap in the query using the corresponding fragment from the ref OR using the unplaced contigs of the query OR it will be a consensus?
How does the RagTag distinguish the misassemblies and real structural variants? I want to solve the misassemblies and keep SVs.
Thanks!
Ural Yunusbaev

Excluded query contigs are put in chr0

Thanks a lot for this easy to use and straightforward software. I have a reference genome which consists of chr1-10. My assembly has 100 contigs, plus a chloroplast contig. With your software I can tell via the '-j' parameter that it should not try to scaffold my chloroplast. That does work, without the parameter it get integrated somewhere on chromosome. However, with using the '-j' skip.txt file now my chloroplast contig is put into a chr0, as I want the unscaffolded contigs to be put into chr0 (using '-C'). Is it possible to run the software in such way that I get:

anchored chromosomes in results fasta (already happening)
a chr0 scaffold with all unplaced contigs in results fasta (already happening)
my contigs which are listed in skip.txt, untouched and present in separate contigs in the result fasta?

Assemblytics

I loved the integration of Assemblytics SV caller with RaGOO. Are there plans to integrate Assemblytics into RagTag? Or maybe I missed that flag/output.

Error when correcting with reads

Hi Michael,

Thanks for the improved version of RaGOO!

RagTag is running great except when I include corrected reads fastq. Unfortunately, the log makes it seem like the ragtag.py correct completed successfully even though it does not produce the ragtag_output/contigs.corrected.fasta file. I've tried it w/ and w/o -u, in case that helps.

ragtag.py correct -t 16 -u --aligner minimap2/minimap2 -R corr_reads.fastq -T corr reference.fasta contigs.fasta

Mon Jul 20 22:43:51 2020 --- RagTag v1.0.0
Mon Jul 20 22:43:51 2020 --- CMD: /work/virtualenvs/venv_devopspy/bin/ragtag_correct.py -t 16 -u --aligner minimap2/minimap2 -R corr_reads.fastq -T corr reference.fasta contigs.fasta
Mon Jul 20 22:43:51 2020 --- Mapping the query genome to the reference genome
Mon Jul 20 22:43:51 2020 --- Running: minimap2/minimap2 -x asm5 -t 16 reference.fasta contigs.fasta > ragtag_output/c_query_against_ref.paf 2> ragtag_output/c_query_against_ref.paf.log
Mon Jul 20 22:48:39 2020 --- Finished running : minimap2 -x asm5 -t 16 reference.fasta contigs.fasta > ragtag_output/c_query_against_ref.paf 2> ragtag_output/c_query_against_ref.paf.log
Mon Jul 20 22:48:39 2020 --- Reading whole genome alignments
Mon Jul 20 22:48:47 2020 --- Filtering and merging alignments
Mon Jul 20 22:48:52 2020 --- Validating putative query breakpoints via read alignment.
Mon Jul 20 22:48:52 2020 --- Aligning reads to query sequences.
Mon Jul 20 22:48:52 2020 --- Running: minimap2/minimap2 -ax asm5 -t 16 contigs.fasta corr_reads.fastq > ragtag_output/c_reads_against_query.sam 2> ragtag_output/c_reads_against_query.sam.log
Mon Jul 20 22:52:28 2020 --- Finished running : minimap2/minimap2 -ax asm5 -t 16 contigs.fasta corr_reads.fastq > ragtag_output/c_reads_against_query.sam 2> ragtag_output/c_reads_against_query.sam.log
Mon Jul 20 22:52:28 2020 --- Compressing, sorting, and indexing read alignments

ragtag_output:
c_query_against_ref.paf
c_query_against_ref.paf.log
c_reads_against_query.bam
c_reads_against_query.sam
c_reads_against_query.sam.log

Ragtag preservationof SNPs and small Indels?

I have some symbiotic fungal genomes that are notorious for being hard to assembles due to repeat regions, resulting in quite fragmented assemblies with not great N50 /L50 scores.

We have a reference genomes for the species and have used the de novo assemblies along with ragtag to generate substantially improved assemblies, so thank you for this!

My question comes when we want to identify biosynthetic gene clusters within our assemblies (which come from different geographical locations). Within this we want to see how conserved or variable common BGCs might be. Thus the preservation of SNPs and small indels would be important to us. This may be an ignorant questions, but how will using Ragtag (and thus reference guided assembly) affect our ability to do this? If the contig is to dissimilar I assume it is discarded?

Best regards
Lamma

malonge / ragtag Goto Github PK

ragtag's People

Contributors

Stargazers

Watchers

Forkers

ragtag's Issues

Recommend Projects

Recommend Topics

Recommend Org