Code Monkey home page Code Monkey logo

Comments (31)

rpomaris avatar rpomaris commented on August 21, 2024

@lbergelson does this seem related to #99 ?

@kvg was having issues with long reads and corrupt bams a few months back.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

I don't think they're the same issue. That seems to be related to reads that extend through multiple compression blocks and is more general to long reads. This seems to be a magically bad confluence of things that results in a segfault on an otherwise innocuous file. It seems to be either hardware specific or specific to osx mojave, but I'm trying to narrow that down still.

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

Got it. Thanks. I'll continue to follow this thread for updates.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

It seems to reproduce on other OSX Mojave machines. (OSX 10.14.6)

The following commands should reproduce the issue on one of the afflicted machines...

git clone https://github.com/broadinstitute/picard.git
cd picard
./gradlew legacyTest --tests "*CollectGcBiasMetricsTest*"

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

@mepowers Have you been able to reproduce this on your end?

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

@lbergelson we have not. Have you been able to reproduce on other machines?

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Yes, I've been able to reproduce it on several machines running OSX Mojave.

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

@lbergelson I escalated this issue - we should be able to get you a response tomorrow.

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

@lbergelson do you know if this issue is reproducible on Linux? We did most of our validation on Linux. The Mac validation was done on a Mac Mini with an older version of OSX and that's still what we have in-house.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

I haven't been able to reproduce it on linux. Only on OSX and specifically mojave, but it seems to happen on every Mojave machine that runs that command.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Some additional information.

  1. It seems that this error is not completely deterministic. I've now seen it succeed without a segfault several times on machine.
  2. One of our developers who is using osx 10.14, an older patch of mojave does NOT see this bug.

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

Thanks @lbergelson . We have a couple new engineers ramping on GKL. We have a couple Linux bugs in the queue and will circle back on this one once those are addressed.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

That's great to hear!

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

@mepowers Any update on the new engineers? We're having more people hit this particular bug in the wild.

from gkl.

nh13 avatar nh13 commented on August 21, 2024

@mepowers ditto, a few of the teams for our clients, that all use Mojave, are now hitting this on a regular basis. For Picard, setting USE_JDK_DEFLATER=true skips using the Intel deflater, as a temporary workaround.

from gkl.

rpomaris avatar rpomaris commented on August 21, 2024

@lbergelson @nh13 we have not reproduced this issue yet. Skipping Intel deflater for this OS is still the best workaround.

from gkl.

SnehalA avatar SnehalA commented on August 21, 2024

@lbergelson Hi Louis, I am trying to reproduce this bug, using ./gradlew $JAVA_PROXY test legacyTest --tests "CollectGcBiasMetricsTest" . It complains mainly about RScript file missing.

Please help me to diagnose this further - by verifying this behavior, or am I missing something here .

image

Also based on the stderr, there is no call to GKL functions. Snapshot of picard/build/reports/tests/legacyTest/classes/picard.analysis.CollectGcBiasMetricsTest.html

image

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Hi @SnehalA!

I'm excited to see that someone is working on reproducing this. The error you're seeing is because you don't have the program Rscript installed on your machine. You can avoid it by installing R and Rscript, or alternatively, I made a branch that disables the call to R. lb_repro_no_r see here: https://github.com/broadinstitute/picard/tree/lb_repro_no_r.

And just to be sure, you're running osx 10.14.6+ It reproduces consistently on that version of osx but not on older versions.

I don't know what the $JAVA_PROXY line in your gradle command does either, I've never seen that before but I assume you put it there for a reason.

from gkl.

SnehalA avatar SnehalA commented on August 21, 2024

@lbergelson I installed R-3.6.2 pkg and all 4/4 tests succeeded for CollectGcBiasMetricsTest.
I am using 10.14.6 and JDK8 . I also tried SortSamTest and no issues either.

image

However for MarkDuplicatesTagRepresentativeReadIndexTest I saw SIGSEGV issue, and also that totalMemory utilization was ~2g. So I tried to disable the gradle daemon and bump up memory parameters, then test PASSED (~9 min). Please check if following setting work for you.

./gradlew -Dorg.gradle.daemon=false -Dorg.gardle.jvmargs="-Xmx6g" legacyTest --tests "MarkDuplicatesTagRepresentativeReadIndexTest" --debug

Here's Activity Monitor Snapshot -
image

Yes, I am setting http proxy settings with local variable $JAVA_PROXY.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Interesting. If I run that exact command I get an error with a segfault.

11:25:51.117 [DEBUG] [TestEventLogger] Gradle suite > Gradle test > picard.sam.markduplicates.MarkDuplicatesTagRepresentativeReadIndexTest > testBulkFragmentsWithDuplicates[1](.bam) STARTED
11:25:51.204 [QUIET] [system.out] #
11:25:51.204 [QUIET] [system.out] # A fatal error has been detected by the Java Runtime Environment:
11:25:51.204 [QUIET] [system.out] #
11:25:51.204 [QUIET] [system.out] #  SIGSEGV (0xb) at pc=0x000000010af5bea7, pid=43865, tid=0x000000000000a803
11:25:51.204 [QUIET] [system.out] #
11:25:51.205 [QUIET] [system.out] # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)
11:25:51.205 [QUIET] [system.out] # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode bsd-amd64 compressed oops)
11:25:51.205 [QUIET] [system.out] # Problematic frame:
11:25:51.205 [QUIET] [system.out] # C  [libgkl_compression6026145343503995221.dylib+0x6ea7]  deflate_medium+0x867
11:25:51.205 [QUIET] [system.out] #
11:25:51.205 [QUIET] [system.out] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
11:25:51.205 [QUIET] [system.out] #
11:25:51.206 [QUIET] [system.out] # An error report file with more information is saved as:
11:25:51.206 [QUIET] [system.out] # /Users/louisb/Workspace/picard/hs_err_pid43865.log
11:25:51.215 [QUIET] [system.out] #
11:25:51.215 [QUIET] [system.out] # If you would like to submit a bug report, please visit:
11:25:51.215 [QUIET] [system.out] #   http://bugreport.java.com/bugreport/crash.jsp
11:25:51.215 [QUIET] [system.out] # The crash happened outside the Java Virtual Machine in native code.
11:25:51.215 [QUIET] [system.out] # See problematic frame for where to report the bug.
11:25:51.215 [QUIET] [system.out] #

Here's the complete log:
log.txt

from gkl.

SnehalA avatar SnehalA commented on August 21, 2024

@lbergelson I see there is typo for -Dorg.gradle.jvmargs="-Xmx6g"! This setting dint get picked up. The default gradle settings may not be suffcient, can you try again?
javaHome=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home,daemonRegistryDir=/Users/louisb/.gradle/daemon,pid=44043,idleTimeout=120000,priority=NORMAL,daemonOpts=-XX:MaxMetaspaceSize=256m,-XX:+HeapDumpOnOutOfMemoryError,**-Xms256m,-Xmx512m**,-Dfile.encoding=UTF-8,-Duser.country=US,-Duser.language=en,-Duser.variant]

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Ah, good catch. I didn't see the typo.

Running with -Xmx seems to sometimes avoid the SIGSEGV but it does so non-deterministically. Back to back runs produced seemingly random results with different memory values. 6G fail, 8G pass, 10g fail, 8g fail.

I think it might be a red herring, or maybe expanding the memory available gives us a better chance of dereferencing a valid address.

Either way though, it's not clear to me how changing the gradle jvm memory allocation should impact the gradle test executor memory, which is a separate process that is configured separately.

from gkl.

SnehalA avatar SnehalA commented on August 21, 2024

@lbergelson Oh, I see different pid for worker and daemon. I want to change COMPRESSION_LEVEL=5 to 1 or 2? I am not able to find where these get set, to run test suite with these values. Please check if this approach works.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

You should be able to change the compression level default with the system property samjdk.compression_level=1. It has to be passed through to the test jvm instance though. The way I know to do that is to add a line into the build.gradle file.

Theres a section in the file

tasks.withType(Test) {
    outputs.upToDateWhen { false } // tests will always rerun
    description = "Runs the unit tests"
  ...

Add the line:

    systemProperty "samjdk.compression_level", "2"

into that section and it should set that system property for all the tests

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

Given our recent 2020 roadmap discussions, this is a high priority fix for us.

from gkl.

SnehalA avatar SnehalA commented on August 21, 2024

We are working on new build and are hoping this resolves the bug.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

That's great news.

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

There's another possibly identical or related issue here: broadinstitute/picard#1329

from gkl.

lbergelson avatar lbergelson commented on August 21, 2024

@SnehalA Someone pinged me asking if this reproduces with 0.8.6 in picard, I can't figure out where that comment was though... I've tested on my machine with that version and I still see the issue when using 0.8.6.

from gkl.

droazen avatar droazen commented on August 21, 2024

@SnehalA believes that this is fixed in 0.8.7

from gkl.

Kmannth avatar Kmannth commented on August 21, 2024

A memory alloction issue has been fixed in Intel Deflater code and there are new version of I-SAL and Zlib with many fixes applied.

A beta test on Mac by @lbergelson the update at least helps.

As a patches are landed to Master we will close this issue. If the issue is seen again please reopen.

from gkl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.