Comments (31)
@lbergelson does this seem related to #99 ?
@kvg was having issues with long reads and corrupt bams a few months back.
from gkl.
I don't think they're the same issue. That seems to be related to reads that extend through multiple compression blocks and is more general to long reads. This seems to be a magically bad confluence of things that results in a segfault on an otherwise innocuous file. It seems to be either hardware specific or specific to osx mojave, but I'm trying to narrow that down still.
from gkl.
Got it. Thanks. I'll continue to follow this thread for updates.
from gkl.
It seems to reproduce on other OSX Mojave machines. (OSX 10.14.6)
The following commands should reproduce the issue on one of the afflicted machines...
git clone https://github.com/broadinstitute/picard.git
cd picard
./gradlew legacyTest --tests "*CollectGcBiasMetricsTest*"
from gkl.
@mepowers Have you been able to reproduce this on your end?
from gkl.
@lbergelson we have not. Have you been able to reproduce on other machines?
from gkl.
Yes, I've been able to reproduce it on several machines running OSX Mojave.
from gkl.
@lbergelson I escalated this issue - we should be able to get you a response tomorrow.
from gkl.
@lbergelson do you know if this issue is reproducible on Linux? We did most of our validation on Linux. The Mac validation was done on a Mac Mini with an older version of OSX and that's still what we have in-house.
from gkl.
I haven't been able to reproduce it on linux. Only on OSX and specifically mojave, but it seems to happen on every Mojave machine that runs that command.
from gkl.
Some additional information.
- It seems that this error is not completely deterministic. I've now seen it succeed without a segfault several times on machine.
- One of our developers who is using osx 10.14, an older patch of mojave does NOT see this bug.
from gkl.
Thanks @lbergelson . We have a couple new engineers ramping on GKL. We have a couple Linux bugs in the queue and will circle back on this one once those are addressed.
from gkl.
That's great to hear!
from gkl.
@mepowers Any update on the new engineers? We're having more people hit this particular bug in the wild.
from gkl.
@mepowers ditto, a few of the teams for our clients, that all use Mojave, are now hitting this on a regular basis. For Picard, setting USE_JDK_DEFLATER=true
skips using the Intel deflater, as a temporary workaround.
from gkl.
@lbergelson @nh13 we have not reproduced this issue yet. Skipping Intel deflater for this OS is still the best workaround.
from gkl.
@lbergelson Hi Louis, I am trying to reproduce this bug, using ./gradlew $JAVA_PROXY test legacyTest --tests "CollectGcBiasMetricsTest"
. It complains mainly about RScript file missing.
Please help me to diagnose this further - by verifying this behavior, or am I missing something here .
Also based on the stderr, there is no call to GKL functions. Snapshot of picard/build/reports/tests/legacyTest/classes/picard.analysis.CollectGcBiasMetricsTest.html
from gkl.
Hi @SnehalA!
I'm excited to see that someone is working on reproducing this. The error you're seeing is because you don't have the program Rscript installed on your machine. You can avoid it by installing R and Rscript, or alternatively, I made a branch that disables the call to R. lb_repro_no_r
see here: https://github.com/broadinstitute/picard/tree/lb_repro_no_r.
And just to be sure, you're running osx 10.14.6+ It reproduces consistently on that version of osx but not on older versions.
I don't know what the $JAVA_PROXY line in your gradle command does either, I've never seen that before but I assume you put it there for a reason.
from gkl.
@lbergelson I installed R-3.6.2 pkg and all 4/4 tests succeeded for CollectGcBiasMetricsTest
.
I am using 10.14.6 and JDK8 . I also tried SortSamTest
and no issues either.
However for MarkDuplicatesTagRepresentativeReadIndexTest
I saw SIGSEGV issue, and also that totalMemory utilization was ~2g. So I tried to disable the gradle daemon and bump up memory parameters, then test PASSED (~9 min). Please check if following setting work for you.
./gradlew -Dorg.gradle.daemon=false -Dorg.gardle.jvmargs="-Xmx6g" legacyTest --tests "MarkDuplicatesTagRepresentativeReadIndexTest" --debug
Here's Activity Monitor Snapshot -
Yes, I am setting http proxy settings with local variable $JAVA_PROXY.
from gkl.
Interesting. If I run that exact command I get an error with a segfault.
11:25:51.117 [DEBUG] [TestEventLogger] Gradle suite > Gradle test > picard.sam.markduplicates.MarkDuplicatesTagRepresentativeReadIndexTest > testBulkFragmentsWithDuplicates[1](.bam) STARTED
11:25:51.204 [QUIET] [system.out] #
11:25:51.204 [QUIET] [system.out] # A fatal error has been detected by the Java Runtime Environment:
11:25:51.204 [QUIET] [system.out] #
11:25:51.204 [QUIET] [system.out] # SIGSEGV (0xb) at pc=0x000000010af5bea7, pid=43865, tid=0x000000000000a803
11:25:51.204 [QUIET] [system.out] #
11:25:51.205 [QUIET] [system.out] # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)
11:25:51.205 [QUIET] [system.out] # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode bsd-amd64 compressed oops)
11:25:51.205 [QUIET] [system.out] # Problematic frame:
11:25:51.205 [QUIET] [system.out] # C [libgkl_compression6026145343503995221.dylib+0x6ea7] deflate_medium+0x867
11:25:51.205 [QUIET] [system.out] #
11:25:51.205 [QUIET] [system.out] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
11:25:51.205 [QUIET] [system.out] #
11:25:51.206 [QUIET] [system.out] # An error report file with more information is saved as:
11:25:51.206 [QUIET] [system.out] # /Users/louisb/Workspace/picard/hs_err_pid43865.log
11:25:51.215 [QUIET] [system.out] #
11:25:51.215 [QUIET] [system.out] # If you would like to submit a bug report, please visit:
11:25:51.215 [QUIET] [system.out] # http://bugreport.java.com/bugreport/crash.jsp
11:25:51.215 [QUIET] [system.out] # The crash happened outside the Java Virtual Machine in native code.
11:25:51.215 [QUIET] [system.out] # See problematic frame for where to report the bug.
11:25:51.215 [QUIET] [system.out] #
Here's the complete log:
log.txt
from gkl.
@lbergelson I see there is typo for -Dorg.gradle.jvmargs="-Xmx6g"! This setting dint get picked up. The default gradle settings may not be suffcient, can you try again?
javaHome=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home,daemonRegistryDir=/Users/louisb/.gradle/daemon,pid=44043,idleTimeout=120000,priority=NORMAL,daemonOpts=-XX:MaxMetaspaceSize=256m,-XX:+HeapDumpOnOutOfMemoryError,**-Xms256m,-Xmx512m**,-Dfile.encoding=UTF-8,-Duser.country=US,-Duser.language=en,-Duser.variant]
from gkl.
Ah, good catch. I didn't see the typo.
Running with -Xmx seems to sometimes avoid the SIGSEGV but it does so non-deterministically. Back to back runs produced seemingly random results with different memory values. 6G fail, 8G pass, 10g fail, 8g fail.
I think it might be a red herring, or maybe expanding the memory available gives us a better chance of dereferencing a valid address.
Either way though, it's not clear to me how changing the gradle jvm memory allocation should impact the gradle test executor memory, which is a separate process that is configured separately.
from gkl.
@lbergelson Oh, I see different pid for worker and daemon. I want to change COMPRESSION_LEVEL=5
to 1 or 2? I am not able to find where these get set, to run test suite with these values. Please check if this approach works.
from gkl.
You should be able to change the compression level default with the system property samjdk.compression_level=1
. It has to be passed through to the test jvm instance though. The way I know to do that is to add a line into the build.gradle file.
Theres a section in the file
tasks.withType(Test) {
outputs.upToDateWhen { false } // tests will always rerun
description = "Runs the unit tests"
...
Add the line:
systemProperty "samjdk.compression_level", "2"
into that section and it should set that system property for all the tests
from gkl.
Given our recent 2020 roadmap discussions, this is a high priority fix for us.
from gkl.
We are working on new build and are hoping this resolves the bug.
from gkl.
That's great news.
from gkl.
There's another possibly identical or related issue here: broadinstitute/picard#1329
from gkl.
@SnehalA Someone pinged me asking if this reproduces with 0.8.6 in picard, I can't figure out where that comment was though... I've tested on my machine with that version and I still see the issue when using 0.8.6.
from gkl.
@SnehalA believes that this is fixed in 0.8.7
from gkl.
A memory alloction issue has been fixed in Intel Deflater code and there are new version of I-SAL and Zlib with many fixes applied.
A beta test on Mac by @lbergelson the update at least helps.
As a patches are landed to Master we will close this issue. If the issue is seen again please reopen.
from gkl.
Related Issues (20)
- How to make the system use these libraries? HOT 1
- GATK4 HaplotypeCaller native SmithWaterman: core dumps and JVM errors HOT 2
- Intermittent core dump in native pair hmm HOT 5
- Does GKL have plans to support ARM architecture optimization? HOT 2
- Help2man is also needed when building. HOT 2
- PairHMM stripe initialization of `M_t_1_y` is wrong HOT 1
- GKL uses exploitable Log4J 2 version HOT 1
- GKL should ideally use commons-logging or similar instead of log4j HOT 2
- `smithWatermanBackTrack` is being shared between AVX2 and AVX-512 HOT 2
- Release 0.8.9 is not on maven HOT 3
- Potential memory leak observed in unusual HaplotypeCaller behavior HOT 1
- Help, where is the function impl of 'compute_fp_avx512s' ?
- Does GKL support arm architecture? HOT 2
- Confusing warning message in IntelInflater HOT 1
- IntelDeflater intermittently fails to properly compress outputs with GKL 0.8.8 HOT 3
- Segfault in the native PairHMM on certain CPU / JVM combinations
- Deflater.needsInput() Always returns true HOT 4
- the way of docker or apptainer install
- GATK AVX accelerated pairHMM Segfault issue with Long reads - Consistently around the same region
- SmithWaterman result inconsistency by optimization level
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gkl.