lh3 / unimap Goto Github PK

View Code? Open in Web Editor NEW

85.0 4.0 4.0 499 KB

A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment

License: MIT License

Makefile 1.27% C 93.86% Roff 4.86%

bioinformatics genomics sequence-alignment

unimap's People

Stargazers

Watchers

Forkers

biozhangzhou mbargull gaworj ningshuang-yao

unimap's Issues

question about mapping parameters

Hi,

I am trying to align a high-divergent hifiasm contigs to hg38 (divergent ~36 MYA, the SNV divergence ~10%).

I first tried the parameters (--eqx -ax asm20 --secondary=no -z 10000,50 -r 50000 --end-bonus=100 -O 5,56 -E 4,1 -B 5), then, I could get highly continuous mapped segments (please see P1.pdf the red blocks, please ignore the blue blocks(it is another assembly))). [P1.pdf]

As you see, there are lots of segments/sequencing are missing on chrX, chr16, and chr19 (p arm).

Then, I tried the parameters (--eqx -ax asm20 --secondary=no) and I could more fragmental mapped segments (Please see P2.pdf the red blocks. the three layers for segments >50kb, >10kp,<10kbp) []

As you see, I could align the 'missing sequences' to hg38, but I can not get larger contiguous aligned segments. [P2.pdf]

Then, I have couple questions about my mapping strategy:

Why are there so many sequences missing with parameters (--eqx -ax asm20 --secondary=no -z 10000,50 -r 50000 --end-bonus=100 -O 5,56 -E 4,1 -B 5) compare to the parameters (--eqx -ax asm20 --secondary=no )?
Could you recommend to me which parameters I should use to get more contiguous and 'no missing aligned segments?

Thank you in advance.

--Yafei

Options to favor detection of several small deletions instead of single large ones

unimap does a great job aligning complex regions (e.g. satellites) even using reads. We are using it to detect some variants at telomeric regions, using PacBio HiFi sequencing data.

We have evidence from one of these regions that there should be 2 separate deletions, corresponding to two separate groups of monomers from a satellite, each of them of different size. However, unimap finds one large deletion instead, close together to several variants that suggest mismapping. I assume this behavior comes from the default settings, that may favour large deletions instead of many small deletions. The program was run with unimap -a -x asm5 -x hifi --cs. I attach snapshot below for this region.

I have been trying to tune settings to penalize nucleotide mismatchings and favour more than 1 deletion if convenient. However, I find some problems including:

-B 500 -O 1 -E 2: increased mismatch penalty and lowered cost for gaps. This results in a "Segmentation fault" error.
-B 500: Does not seem to solve the problem, the alignment remains the same.
-O 1 -E 2: This makes some improvements, and for some reads we observe that mismatches disappear in favour of deletions (snapshot below), but not for the majority of reads. Cannot make progress from here however, since using "-B > 6" will cause the segfault, lower values of "-B" won't change anything, and -O and -E are near their minimum values (cannot take values of zero).

Wonder if there are other options that are worth exploring here. Any hints on how to deal with this situation? Thank you

Multi mapping

Dear Li,

Is there any way to align genome 1 to 3 or more?
I would like to check the whole genome duplication by using unimap but it seems it only work with 1 to 1.
Thanks.
Won

Question about presetting

Hi and thanks for the tool.
As instructed, I want to use it to map a de novo genome assembly from pacbio long reads to a reference genome. I'm not sure on which presets to use in this case. Is there anyone specific to pacbio, as it seems to be for nanopore? I guess I should use asm5 since I expect the assembly to be similar to the reference, but what about asm10 and asm20? When are these recommended?
Your advice would be much appreciated, thank you for your time and your support

LICENSE.txt not in the source tree

Hi @lh3,

given that this is a minimap2 (+minigraph) fork, I'd assume you intended this to be MIT-licensed as well, but I couldn't find a license here.
I'm currently in the process of packaging dipcall/unimap/bedtk for Bioconda and in need of a license information file so we are able to distribute unimap :).

Cheers,
Marcel

mm_idx_str-like interface

Hello Heng,

Unimap seems like a very useful tool!
Is it possible to provide a mm_idx_str - like interface for index construction in unimap? (that takes "const char **seq" instead of path to the reference file)

Thank you!

Question about mapping

Hi,

This looks like a very useful tool! Does it extend the ~1-1 chaining algorithm down to base pair level alignment? i.e. does this improve 1-1 mappings in very complex regions?

Thanks in advance!
Mitchell

lh3 / unimap Goto Github PK

unimap's People

Stargazers

Watchers

Forkers

unimap's Issues

question about mapping parameters

Options to favor detection of several small deletions instead of single large ones

Multi mapping

Question about presetting

LICENSE.txt not in the source tree

mm_idx_str-like interface

Question about mapping

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent