Comments (5)
Hi @EugeneEA , if you want to convert every missing GT to homozygous reference, try this:
mt = mt.annotate_entries(GT = hl.coalesce(mt.GT, hl.call(0, 0)))
from hail.
Thanks a lot!
from hail.
This does not work for me unfortunately. I started with a table in coordinate format. I then converted to a matrix table. If I do mt.show()
of course some values are correctly missing. But if I then do mt = mt.annotate_entries(GT = hl.coalesce(mt.GT, hl.call(0, 0)))
nothing is filled. The NA values remain NA. If I do mt.entry.take(5)
or mt.GT.take(5)
there are no "None" values in the list. It's as if the mt.GT list only contains the non-missing data. Similarly if I do mt.annotate_entries(test=1)
, This entry is added only to the coordinates that have GT non-missing and test=NA for everything else! Any help would be appreciated
from hail.
Hey @ag14774 !
Short answer: add an mt.unfilter_entries()
before you annotate_entries
.
This is a bit confusing but there are two distinct ideas in Hail:
- filteredness
- missingness
A row can be filtered, a column can be filtered and an entry can be filtered. When rows or columns are filtered, all their entries are also filtered and the row or column key is excluded entirely. When entries are filtered, its just the individual entry that is removed.
In contrast, any particular value can be "missing". A field, like GT, can be missing, but so can a particular element of an array, or a particular field of an hl.struct
.
We think of "filtered" data as not even existing. It's not included in the denominator of, say, hl.agg.mean
.
We think of "missing" data as something that exists but is hidden from us. When we do statistics, we need to decide how to treat that data, sometimes we mean-impute, sometimes we convert to hom-ref, sometimes we use a sophisticated model that handles missingness directly.
from hail.
Thanks @danking that worked!
from hail.
Related Issues (20)
- [batch] If batch/regions is not explicitly specified, Batch should (maybe) fail.
- [qob] log4j errors are showing up in JVM container logs
- [query] whole stage codegen fails with exit code 137 on partitions with lots of alleles HOT 1
- [query] gnomAD 2.1.1 sites table VCF is not parseable with Hail HOT 1
- NormalizeNames at most once
- [auth] deleting a user is broken in the UI HOT 3
- [query] test_linalg.py::test_slicing, serially executes 28 BM distributed operations HOT 1
- [query] a few VEP fields might be specified incorrectly in the VEP JSON schema
- [batch] Scheduler and autoscaler deadlocked
- [query] mt[x.locus, x.alleles] means something completely different from ht[x.locus, x.alleles] in a confusing way HOT 2
- Bug in HWE
- [query] unexpectedly slow queries in AoU RWB
- [batch] Azure preemption rate was too high due to our set point for spot billing prices HOT 1
- Batch Driver Does not shut down tasks in the correct order
- Batch front-end pods cannot handle large job logs
- Big job logs can brick a worker
- [batch] The GCP logging client breaks invariants when processing activity logs
- [query] `hl.vds.new_combiner` gives LPGT the wrong type in Hail 0.2.120 HOT 1
- [query] error when annotating a matrix table with a table (joins) HOT 10
- [query] bad error message when user needs to use array_elements_required=False HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hail.