Code Monkey home page Code Monkey logo

Comments (10)

malonge avatar malonge commented on July 17, 2024

Hi there,

Thanks for bringing this up. I am not sure what went wrong since the log doesn't indicate any error. I suggest you run minimap2 outside of ragtag and maybe that will give some indication of what is going on. Just run the following command:

minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log

Let me know if it fails again or if it runs to completion.

Thanks

from ragtag.

rderelle avatar rderelle commented on July 17, 2024

Hi,

thanks!
You were right: I've tried with another version of minmap2 installed on our cluster and it works (v2.13 vs v2.15 in my first attempt).
nb: not saying here that v2.15 wouldn't work with ragtag.

I have then tried to run 'ragtag correct' without the .gff file and it worked again.

Then I I've tried to run it with the 2 read files I have (forward and reverse reads; as opposed to only the forward reads in my previous attempts) using the -F option pointing to a txt file containing the names of my 2 fastq files but it stopped halfway:

Tue Aug 11 16:00:28 2020 --- RagTag v1.0.0
Tue Aug 11 16:00:28 2020 --- CMD: /xxx/anaconda3/bin/ragtag_correct.py 1-Genome_assembly.fa C1_k105_scaffolds.fasta -t 2 -T sr -F list_files.txt -u -o C1_correct_no_gff_and_list
Tue Aug 11 16:00:28 2020 --- Mapping the query genome to the reference genome
Tue Aug 11 16:00:28 2020 --- Running: minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf.log
Tue Aug 11 16:00:42 2020 --- Finished running : minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf.log
Tue Aug 11 16:00:42 2020 --- Reading whole genome alignments
Tue Aug 11 16:00:43 2020 --- Filtering and merging alignments
Tue Aug 11 16:00:43 2020 --- Validating putative query breakpoints via read alignment.
Tue Aug 11 16:00:43 2020 --- Aligning reads to query sequences.
Tue Aug 11 16:00:43 2020 --- Running: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log
Tue Aug 11 16:00:48 2020 --- Finished running : minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log
Tue Aug 11 16:00:48 2020 --- Compressing, sorting, and indexing read alignments
Tue Aug 11 16:00:48 2020 --- Indexing read alignments
Tue Aug 11 16:00:48 2020 --- Validating putative query breakpoints
Tue Aug 11 16:00:48 2020 --- Calculating global read coverage
Traceback (most recent call last):
File "/xxx/anaconda3/bin/ragtag_correct.py", line 645, in
main()
File "/xxx/anaconda3/bin/ragtag_correct.py", line 610, in main
ctg_breaks = validate_breaks(ctg_breaks, output_path, num_threads, overwrite_files, val_min_break_end_dist, max_cov, min_cov, window_size=val_window_size, clean_dist=min_break_dist, debug=debug_mode)
File "/xxx/anaconda3/bin/ragtag_correct.py", line 168, in validate_breaks
glob_med = get_median_read_coverage(output_path, num_threads, overwrite_files)
File "/xxx/anaconda3/bin/ragtag_correct.py", line 124, in get_median_read_coverage
raise ValueError()
ValueError

Indeed in this case the output file 'c_reads_against_query.s.bam.stats' does not have any line starting from 'COV' ... hence the error message I believe.

Perhaps am I not using courtly the option -F ?

best,
Romain

from ragtag.

rderelle avatar rderelle commented on July 17, 2024

looking at the output file 'c_reads_against_query.sam.log' (almost empty), I can see that no read has been mapped.

from ragtag.

malonge avatar malonge commented on July 17, 2024

Hi there,

Thanks for these details. Unfortunately, this doesn't appear to be a problem with RagTag, but rather with minimap2. As with the first example, the best way to debug is to run the aligner and see why it is not producing alignments. So you could rerun the following:

minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log

As far as RagTag is concerned, that is a valid minimap2 command.

Let me put it another way: RagTag reports all of the alignment commands used (like the one above). If they fail for some reason, the best way to debug is to run the same command outside of ragtag and reproduce the error. At the end of the day, RagTag will not work if minimap2 isn't working. For example, if minimap2 runs out of memory, then one must focus of resolving that issue with minimap2.

That said, I think RagTag can do a much better job of reporting errors. The value error raised here needs more information. And perhaps it can check if alignment files are empty in order to provide a more useful error message.

Anyways let me know if you can reproduce the error by running minimap2 outside of ragtag.

EDIT
I think RagTag does a good job of reporting when aligner jobs have just failed. But if they fail silently (like producing empty alignment files), RagTag isn't good at reporting those errors.

from ragtag.

rderelle avatar rderelle commented on July 17, 2024

Thanks for the reply.

However the minimap2 command line works perfectly -> reads from both files are mapped to the genome.

I tried again to run ragtag after modifying the txt file containing the 2 names files (giving relative paths instead of absolute paths) but again it didn't work.
I'll try to go deeper into this issue but it seems 'somehow' to be a ragtag issue rather than a minimap2 issue.

from ragtag.

rderelle avatar rderelle commented on July 17, 2024

From what I understand, the problem seems to come from the way ragtag calls minimap2 with 2 read files, probably in the class Minimap2SAMAligner (Aligner.py file) but I have not detected any mistake.

to recapitulate my observations:

_ when running ragtag withe the -F option pointing to a a text file containing the path of the forward and reverse reads, the minimap2 command is correct (it works when running it) but strangely it doesn't work properly and no read is map.

_ from the file 'c_reads_against_query.sam.log':

[M::mm_idx_gen::3.0561.36] collected minimizers
[M::mm_idx_gen::3.741
1.47] sorted minimizers
[M::main::3.7451.47] loaded/built the index for 27604 target sequence(s)
[M::mm_mapopt_update::3.745
1.47] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 27604
[M::mm_idx_stat::3.938*1.45] distinct minimizers: 16518461 (95.55% are singletons); average occurrences: 1.091; average spacing: 6.034
ERROR: failed to open file '/xxx/sp/03_RagTag/C1_1_trim.fq.gz /xxx/sp/03_RagTag/C2_1_trim.fq.gz'
[M::main] Version: 2.13-r850
[M::main] CMD: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/03_RagTag/C1_1_trim.fq.gz /xxx/sp/03_RagTag/C2_1_trim.fq.gz
[M::main] Real time: 3.959 sec; CPU: 5.723 sec; Peak RSS: 0.774 GB

It looks as if ragtag is giving minimap2 a single filename consisting on the concatenation of the 2 filenames. Not sure I'm interpreting this correctly (??).

from ragtag.

malonge avatar malonge commented on July 17, 2024

Hi there,

Great - this log does indicate a RagTag bug. I think your interpretation is correct. I will look into it and fix this bug in the next patch.

This also reinforces the need for better error reporting. That may be a little more nuanced but I will look into it.

As for the first issue, I think I will still assume that there is no bug there.

In the meanwhile, if you are eager for results, you can run minimap2 outside of ragtag (like you have done) and just name the alignments with the expected ragtag name and put them in the output directory. RagTag won't try to overwrite preexisting alignment files, so it should work fine. Let me know if you need more details.

from ragtag.

malonge avatar malonge commented on July 17, 2024

I think the problem is here:

RagTag/ragtag_correct.py

Lines 584 to 585 in 5df41f1

al = Minimap2SAMAligner(query_file, " ".join(read_files), read_aligner_path, "-ax sr -t " + str(num_threads),
output_path + "c_reads_against_query", in_overwrite=overwrite_files)

I try to join the two file names with a space when really they should be separate elements in a list.

from ragtag.

rderelle avatar rderelle commented on July 17, 2024

I agree, the problem seems to come from this class object.

Thanks for the tip, it works when running minimap2 before ragtag.

from ragtag.

malonge avatar malonge commented on July 17, 2024

Fixed in v1.0.1

from ragtag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.