Comments (9)
from gkl.
Hi @droazen, @kvg - Is this is a blocker? Or is running with the JDK deflater a workaround?
Please feel free to tag me in future GKL issues so I'm sure to see them.
from gkl.
@mepowers Running the JDK deflater does permit us to work around the issue for now. However, the JDK deflater is ~6x slower, so it is unfortunately a rather expensive workaround.
from gkl.
@kvg - thanks for the update. @droazen mentioned you have a bunch of test cases. Have you confirmed it's triggered when a read spans many compressed blocks?
from gkl.
@kvg I am trying to replicate this issue, is SplitSubreadsByZmw
a custom option? Is there any other particular branch or fork available with test bam file?
from gkl.
@kvg If you had a chance to look into this, can you share the branch details. Thanks!
from gkl.
from gkl.
Hi @SnehalA,
Frustratingly (or thankfully, depending on your point of view), I can't reproduce this error anymore. I'm not sure how it would have been resolved, but it seems to be gone for now.
FYI, to test this intermittent behavior in the first place, I put together an updated test last night, just running my read sharding code (now called ShardLongReads
) on a test E. coli sample in loop, testing each iteration to see if any of the output BAMs are corrupted. The input BAM file and testing script are at: https://www.dropbox.com/sh/9dittkiojm20l9n/AAAda8k7ghOcJzppZ9tWUVt1a?dl=0 .
The "out.old/" directory (made with GATK SplitSubreadsByZmw version 4.1.0.0-52-g7cc8020-SNAPSHOT) has the results from an older run, the ".000002.bam" being corrupt. I ran a slightly newer version the sharding tool in a loop 200 times (GATK ShardLongReads version 4.1.0.0-53-g240f7a9-SNAPSHOT). The newer version doesn't trigger the error.
I have no idea as to what changed. My tool is not substantially different between versions: ( old version ; new version )
So, I think I'll update my pipelines to start using the Intel codecs again, and if that more extensivie processing fails for some reason, I'll let you know. Until then, I think we can close this issue.
from gkl.
Hi @droazen I do not have maintainer permission, can you please close this issue?
from gkl.
Related Issues (20)
- SmithWaterman returns incorrect alignments. HOT 4
- Systematically test the GKL deflater/inflater with long reads data HOT 7
- Zlib performance comparison for genomic data
- Run valgrind on the GKL codebase HOT 2
- How to make the system use these libraries? HOT 1
- GATK4 HaplotypeCaller native SmithWaterman: core dumps and JVM errors HOT 2
- Intermittent core dump in native pair hmm HOT 5
- Does GKL have plans to support ARM architecture optimization? HOT 2
- Help2man is also needed when building. HOT 2
- PairHMM stripe initialization of `M_t_1_y` is wrong HOT 1
- GKL uses exploitable Log4J 2 version HOT 1
- GKL should ideally use commons-logging or similar instead of log4j HOT 2
- `smithWatermanBackTrack` is being shared between AVX2 and AVX-512 HOT 2
- Release 0.8.9 is not on maven HOT 3
- Potential memory leak observed in unusual HaplotypeCaller behavior HOT 1
- Help, where is the function impl of 'compute_fp_avx512s' ?
- Does GKL support arm architecture? HOT 2
- Confusing warning message in IntelInflater HOT 1
- IntelDeflater intermittently fails to properly compress outputs with GKL 0.8.8 HOT 3
- Segfault in the native PairHMM on certain CPU / JVM combinations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gkl.