Code Monkey home page Code Monkey logo

Comments (12)

depristo avatar depristo commented on May 17, 2024

@arostamianfar This seems like an issue for you.

from deepvariant.

arostamianfar avatar arostamianfar commented on May 17, 2024

The actual error is "The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw') which means you might need to rerun make_examples to genenerate the examples again."

@pichuan @depristo this is odd since the pipeline ran as a single workflow. The model and docker binary paths also seem correct. One issue I can think of is most of the shards being empty (the output has 64 shards, but it's only 1.3KB in total). Do you know if empty shards could cause such an error?

P.S. the 'gsutil not found' error is actually harmless. I think we should provide a 'parser' for these errors based on the logs that provides a meaningful error message.

from deepvariant.

arostamianfar avatar arostamianfar commented on May 17, 2024

yeap, it's caused by empty shards. I was able to reproduce this by using 64 shards with the quickstart test data. @depristo should I file a separate issue for this as it's not really a docker issue?

@chenshan03: thanks for the report. As a workaround until this bug is fixed, you may reduce the number of shards to avoid having empty ones.

from deepvariant.

depristo avatar depristo commented on May 17, 2024

@pichuan @scott7z I believe the empty shards bug has been fixed, is that correct?

from deepvariant.

pichuan avatar pichuan commented on May 17, 2024

Hi Mark and Asha,
here's what I believe the current status is:
(1) If there is just an empty shard (a shard file that exist, but just contains 0 record) out of many, what happens is the code will move on to the next shard to attempt to read image/format. -- this is what Mark meant by the previously fixed empty shards bug.
(2) However, if all the shard files exist but all of them contains 0 records, the current code can fail with that error message above.

In this case, if the actual error message observed is:
The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw')

It seems like this call_variant run is specifically being done on on that one file. And if that file has 0 record, unfortunately it will currently fail with that error. :-(

So, I think this is a real bug that we should fix. Because we do expect the use case where users run 64 separate call_variants, and some of them might have complete empty single input file. Is that correct?

from deepvariant.

arostamianfar avatar arostamianfar commented on May 17, 2024

yes, I think this is a real bug that still exists.
Due to the distributed nature of the cloud process, some machines may get shards that are all empty. Also, we actually only supply one of the shards to each process, so (1) doesn't really apply (there is no 'next shard').
You can reproduce this by adding "--shards 64" to the quickstart test data configuration in https://cloud.google.com/genomics/deepvariant.

from deepvariant.

depristo avatar depristo commented on May 17, 2024

My view is that if all shards are empty we should just write an empty CVO file. If that's not what happens right now, let's add a bug to buganizer and fix it.

from deepvariant.

pichuan avatar pichuan commented on May 17, 2024

I filed a bug in buganizer.

from deepvariant.

cmclean avatar cmclean commented on May 17, 2024

This has been fixed by the DeepVariant 0.5.1 release that just came out a few minutes ago. Thank you for raising attention to this issue.

from deepvariant.

pgrosu avatar pgrosu commented on May 17, 2024

Hi Cory (@cmclean),

Thank you for the new release, but if we look at the new timings with the 0.5.1 release, they seem to have gotten longer than with the previous version:

Commit v0.5.1

Timings: Whole Genome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

whole-genome-case-study-timing

Timings: Exome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

exome-case-study-timings

What is the cause of the additional delay in version 0.5.1 as compared to the previous one?

Thanks,
Paul

from deepvariant.

depristo avatar depristo commented on May 17, 2024

Hi Paul,

Two quick suggestions. First, I'd recommend posting this question in a separate issue, to keep the discussion clean since this is a very interesting and general observation.

Second, it's unclear to us if this is normal variation in cloud timing [not all machines you create are identical. For example, the case study command:

gcloud beta compute instances create "${USER}-deepvariant-casestudy"  --scopes "compute-rw,storage-full,cloud-platform" --image-family "ubuntu-1604-lts" --image-project "ubuntu-os-cloud" --machine-type "custom-64-131072" --boot-disk-size "300" --boot-disk-type "pd-ssd" --zone "us-west1-b"

Doesn't specify the exact machine type, so we're likely getting skylake processors sometimes and broadwell processors other times. That alone could account for the variation in timing we are seeing here.

from deepvariant.

pichuan avatar pichuan commented on May 17, 2024

Hi all,
it has recently be reported again that the crashing issue on empty shard for call_variants wasn't fully resolved last time. I just released v0.6.1 that should really resolve this issue now:
https://github.com/google/deepvariant/releases/tag/v0.6.1

The issue was that I didn't properly return in the if branch where an empty shard was detected:
12f9e67
(And the unit test I had for it was flawed. We'll fix the unit test in a later release.)

This time I've tested it manually on an empty shard, and confirmed that call_variants works when there is zero record.

Please feel free to report if you see any issues again. Thank you!

from deepvariant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.