Code Monkey home page Code Monkey logo

unimap's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

unimap's Issues

question about mapping parameters

Hi,

I am trying to align a high-divergent hifiasm contigs to hg38 (divergent ~36 MYA, the SNV divergence ~10%).

I first tried the parameters (--eqx -ax asm20 --secondary=no -z 10000,50 -r 50000 --end-bonus=100 -O 5,56 -E 4,1 -B 5), then, I could get highly continuous mapped segments (please see P1.pdf the red blocks, please ignore the blue blocks(it is another assembly))). [P1.pdf]

As you see, there are lots of segments/sequencing are missing on chrX, chr16, and chr19 (p arm).

Then, I tried the parameters (--eqx -ax asm20 --secondary=no) and I could more fragmental mapped segments (Please see P2.pdf the red blocks. the three layers for segments >50kb, >10kp,<10kbp) []

As you see, I could align the 'missing sequences' to hg38, but I can not get larger contiguous aligned segments. [P2.pdf]

Then, I have couple questions about my mapping strategy:

  1. Why are there so many sequences missing with parameters (--eqx -ax asm20 --secondary=no -z 10000,50 -r 50000 --end-bonus=100 -O 5,56 -E 4,1 -B 5) compare to the parameters (--eqx -ax asm20 --secondary=no )?

  2. Could you recommend to me which parameters I should use to get more contiguous and 'no missing aligned segments?

Thank you in advance.

--Yafei

Options to favor detection of several small deletions instead of single large ones

unimap does a great job aligning complex regions (e.g. satellites) even using reads. We are using it to detect some variants at telomeric regions, using PacBio HiFi sequencing data.

We have evidence from one of these regions that there should be 2 separate deletions, corresponding to two separate groups of monomers from a satellite, each of them of different size. However, unimap finds one large deletion instead, close together to several variants that suggest mismapping. I assume this behavior comes from the default settings, that may favour large deletions instead of many small deletions. The program was run with unimap -a -x asm5 -x hifi --cs. I attach snapshot below for this region.

telomere_5_unimap2

I have been trying to tune settings to penalize nucleotide mismatchings and favour more than 1 deletion if convenient. However, I find some problems including:

  • -B 500 -O 1 -E 2: increased mismatch penalty and lowered cost for gaps. This results in a "Segmentation fault" error.
  • -B 500: Does not seem to solve the problem, the alignment remains the same.
  • -O 1 -E 2: This makes some improvements, and for some reads we observe that mismatches disappear in favour of deletions (snapshot below), but not for the majority of reads. Cannot make progress from here however, since using "-B > 6" will cause the segfault, lower values of "-B" won't change anything, and -O and -E are near their minimum values (cannot take values of zero).

telomere_5_unimap3

Wonder if there are other options that are worth exploring here. Any hints on how to deal with this situation? Thank you

Multi mapping

Dear Li,

Is there any way to align genome 1 to 3 or more?
I would like to check the whole genome duplication by using unimap but it seems it only work with 1 to 1.
Thanks.
Won

Question about presetting

Hi and thanks for the tool.
As instructed, I want to use it to map a de novo genome assembly from pacbio long reads to a reference genome. I'm not sure on which presets to use in this case. Is there anyone specific to pacbio, as it seems to be for nanopore? I guess I should use asm5 since I expect the assembly to be similar to the reference, but what about asm10 and asm20? When are these recommended?
Your advice would be much appreciated, thank you for your time and your support

LICENSE.txt not in the source tree

Hi @lh3,

given that this is a minimap2 (+minigraph) fork, I'd assume you intended this to be MIT-licensed as well, but I couldn't find a license here.
I'm currently in the process of packaging dipcall/unimap/bedtk for Bioconda and in need of a license information file so we are able to distribute unimap :).

Cheers,
Marcel

mm_idx_str-like interface

Hello Heng,

Unimap seems like a very useful tool!
Is it possible to provide a mm_idx_str - like interface for index construction in unimap? (that takes "const char **seq" instead of path to the reference file)

Thank you!

Question about mapping

Hi,

This looks like a very useful tool! Does it extend the ~1-1 chaining algorithm down to base pair level alignment? i.e. does this improve 1-1 mappings in very complex regions?

Thanks in advance!
Mitchell

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.