Code Monkey home page Code Monkey logo

Comments (3)

danielecook avatar danielecook commented on May 20, 2024
m64014_181209_091052/3146438/0_11098    16
m64014_181209_091052/3146438/0_11098    2064

Actually it looks like the second alignment is a supplementary alignment (explain flags).

Due to the way we perform alignments, it is possible for subreads to align multiple times to the same CCS read (a primary alignment + supplementary alignments). DeepConsensus does have an internal filter: we take the first alignment and discard the rest. Here is the code responsible for that filtering:

"""Merge aligned read proto and unaligned read proto."""
(_, (aligned_reads, unaligned_reads)) = subread_data
if not aligned_reads or not unaligned_reads:
return
aligned = aligned_reads[0]
unaligned = unaligned_reads[0]
# Do not error when there exist, for a given read, multiple alignments to
# correct molecule. Some of the alignments may be supplementary alignments.
# Keep a count and use the first alignment.
if len(aligned_reads) > 1:
logging.info('Unexpected: %d aligned reads for %s', len(aligned_reads),
aligned.fragment_name)
self.multiple_alignments_counter.inc()

Please reopen if you have further questions.

from deepconsensus.

sjin09 avatar sjin09 commented on May 20, 2024

I find instances where the supplementary alignment occurs before the primary alignment and the deepconsensus internal filter subsequently would used the supplementary alignment instead of the primary alignment for the consensus calling.

The attached image shows that the supplementary alignment for subread m64089_200122_110840/198615/5556_21349 before the primary alignment.

Screenshot 2021-09-03 at 11 43 17


Simple python script to only select primary alignments with matching ZMW before deepConsensus.

import sys
import pysam
import natsort
import argparse

class BAM:
    def __init__(self, line):
        # target
        self.tname = line.reference_name
        self.tzmw = self.tname.split("/")[1]
        # query
        self.qname = line.query_name
        self.qzmw = self.qname.split("/")[1]


def parse_args(args):
    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        "-i",
        "--input",
        type=str,
        required=True,
        help="path to input BAM file"
    )
    parser.add_argument(
        "-o",
        "--output",
        type=str,
        required=True,
        help="path to output BAM file"
    )
    args = args[1:]
    return parser.parse_args(args)


def bamfilter(infile, outfile):
    insam = pysam.AlignmentFile(infile, "rb", check_sq=False)
    outsam = pysam.AlignmentFile(outfile, "wb", template=insam)
    for line in insam:
        read = BAM(line)
        flag = line.to_string().strip().split()[1]
        if (flag == "0" or flag == "16") and read.tzmw == read.qzmw:
            outsam.write(line)
    outsam.close()
    insam.close()
    pysam.index(outfile)


def main():
    options = parse_args(sys.argv)
    bamfilter(options.input, options.output)
    sys.exit(0)


if __name__ == "__main__":
    main()

from deepconsensus.

MariaNattestad avatar MariaNattestad commented on May 20, 2024

We were just looking into this, and it looks like supplying --best-n 1 to the pbmm2 align command makes it output only the primary alignment. The most important thing though is that we keep the alignment to the correct ZMW, so we are still investigating the downstream effects of changes to the alignment and filtering process.

from deepconsensus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.