Comments (3)
m64014_181209_091052/3146438/0_11098 16
m64014_181209_091052/3146438/0_11098 2064
Actually it looks like the second alignment is a supplementary alignment (explain flags).
Due to the way we perform alignments, it is possible for subreads to align multiple times to the same CCS read (a primary alignment + supplementary alignments). DeepConsensus does have an internal filter: we take the first alignment and discard the rest. Here is the code responsible for that filtering:
deepconsensus/deepconsensus/preprocess/merge_datasets_transforms.py
Lines 156 to 170 in a497d55
Please reopen if you have further questions.
from deepconsensus.
I find instances where the supplementary alignment occurs before the primary alignment and the deepconsensus internal filter subsequently would used the supplementary alignment instead of the primary alignment for the consensus calling.
The attached image shows that the supplementary alignment for subread m64089_200122_110840/198615/5556_21349 before the primary alignment.
Simple python script to only select primary alignments with matching ZMW before deepConsensus.
import sys
import pysam
import natsort
import argparse
class BAM:
def __init__(self, line):
# target
self.tname = line.reference_name
self.tzmw = self.tname.split("/")[1]
# query
self.qname = line.query_name
self.qzmw = self.qname.split("/")[1]
def parse_args(args):
parser = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
"-i",
"--input",
type=str,
required=True,
help="path to input BAM file"
)
parser.add_argument(
"-o",
"--output",
type=str,
required=True,
help="path to output BAM file"
)
args = args[1:]
return parser.parse_args(args)
def bamfilter(infile, outfile):
insam = pysam.AlignmentFile(infile, "rb", check_sq=False)
outsam = pysam.AlignmentFile(outfile, "wb", template=insam)
for line in insam:
read = BAM(line)
flag = line.to_string().strip().split()[1]
if (flag == "0" or flag == "16") and read.tzmw == read.qzmw:
outsam.write(line)
outsam.close()
insam.close()
pysam.index(outfile)
def main():
options = parse_args(sys.argv)
bamfilter(options.input, options.output)
sys.exit(0)
if __name__ == "__main__":
main()
from deepconsensus.
We were just looking into this, and it looks like supplying --best-n 1
to the pbmm2 align
command makes it output only the primary alignment. The most important thing though is that we keep the alignment to the correct ZMW, so we are still investigating the downstream effects of changes to the alignment and filtering process.
from deepconsensus.
Related Issues (20)
- Error detecting params.json using docker in debian (10) HPC HOT 2
- Installation from source file problem HOT 2
- lower quality and less reads in deepconsensus 1.0 output compared to ccs HOT 2
- python 3.9 HOT 2
- [Repeat] Running deepconsensus results in "free(): invalid pointer" error HOT 17
- QV for each ccs reads HOT 2
- Public raw train dataset availability HOT 2
- Cannot open/create ccs.bam file? HOT 6
- vRAM limit HOT 1
- the label without alignment HOT 5
- bam or fastq issue HOT 2
- Separate subreads for mixed samples? HOT 3
- GPU installation failure with pip HOT 3
- GPU installation using quick start guide fails HOT 5
- OSError: error -3 while reading file HOT 8
- normal pass / fail rate? HOT 2
- KeyError HOT 5
- About making ground truth. HOT 2
- Optimizing runtime on HPC HOT 2
- issue with tensorflow HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepconsensus.