Comments (6)
That makes sense. Thank you very much, @samhunter.
I think documenting the behavior and maybe an exception or warning for length overruns would suffice.
from htstream.
Are you running with any parameters?
Maybe try:
-s 5 -l 20
To ask superdeduper to start at base 5 and consider a fragment 20bp long.
If that doesn't give you the behavior you want, could you:
- provide the log file (e.g. -L dedup.log)
- Provide an example file with the first ~100 reads
from htstream.
Both with default start/length and 5/20.
Call$ hts_SuperDeduper -U DKR100_S42_L001_R1_001.fastq.gz -L hts_sd_5-20_test.stats.log -s 5 -l 20
DKR100_S42_L001_R1_001.fastq.gz
hts_sd_5-20_test.stats.log
hts_sd_test.stats.log
from htstream.
Hi @channsoden I have been able to replicate the behavior you reported, and I think I found the bug in SuperDeduper. Let me do a little bit of testing and get back to you.
from htstream.
Thanks, @samhunter. Curious what is going on here.
from htstream.
Hi again @channsoden. So it turns out this is actually not a "bug" it's a "feature".
When you work with PE data, we require two pieces, one from each read in order to form a unique key for the fragment: R1[start:length] + R2[start:length]
Our thought was that we should require the same level of evidence for a SE read duplicate, so we use:
R1[start:length*2]
If you use -s 1 -l 12 it should work. You will actually be filtering for duplicates using 24bp of the read as a key.
We will discuss whether this behavior is what we want to keep going forward (with better documentation), or change it somehow.
from htstream.
Related Issues (20)
- hts_SeqScreener enhancements for bigger references
- hts_Primer doesn't report fragments and basepairs_in
- Feature downgrade actually, remove -a option from SuperD
- -m (minLength) option removed from hts_QWindowTrim, but does not exist in hts_CutTrim HOT 3
- Flag use HOT 1
- Is "percentage-hits" calculated properly for SeqScreener? HOT 1
- hts_Primers doesn't seem to read multi-fasta files correctly
- hts_Primers - error message HOT 4
- Version incorrect and CMAKE_PREFIX_PATH not working HOT 5
- How to cite HTStream? :-) HOT 2
- hts_LengthFilter is missing from the documentation!
- citation? HOT 2
- Add to CutTrim, trim to length from 5' or 3'
- pointer error in hts_Stats HOT 1
- Compiling from source fails on Ubuntu 22.04.1 HOT 2
- Remaining adapter sequence
- Order of input files to hts_SeqScreener changes hits reported when R1/R2 lengths differ
- "no such file or directory" error HOT 2
- munmap_chunk(): invalid pointer error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htstream.