Code Monkey home page Code Monkey logo

shrek's Introduction

shrek

The SNP Heritability and Risk Estimation Kit

shrek's People

Contributors

choishingwan avatar rmporsch avatar

Watchers

 avatar  avatar  avatar

shrek's Issues

Ambiguous SNPs

For the risk prediction, when we encounter an ambiguous SNP e.g. ref = A, alt = T, we will just flip it so that it will be the same as the one in the p-value file. Might need to figure out a better way to do that in the future

Missing Genotype (Risk Prediction)

For risk prediction, we haven't implement a way to deal with missing data. Currently we directly set the genotype*effect as 0. However, we do know that this will still affect the decomposition as we are trying to solve the whole system of equation
This will be something that we need to do in the future

SNP flipping (LD Matrix)

Currently we don't take in any information from the user as to what the reference Strand should be. Will need this for better calculation of LD (might need to flip the reference and target)

Block size estimation

For LD block size, we currently only allow the maximum and minimum block size set for ALL chromosomes. Might want to allow people to do it individually in the future.
Also, we are using the densest region as the estimated block size. Dynamic block size might be something good?

case control sample size

Similar to the variance problem, when performing case control analysis, we don't actually allow a different sample size for each individual SNPs e.g. all SNPs will assume having the same case and same control.

Variance Estimation

Our current variance estimation only use the maximum sample size. So if there are 2 SNPs in the data, one with 1000 samples, and the other have 10 samples, the variance calculated will be using 1000 samples

Standardizing genotype (risk prediction)

When trying to perform risk prediction, we need to compute the standardized genotype. However, if there is no variation in the sample (e.g. all sample has the same genotype), we cannot standardize them as the SD = 0.

Currently we just set the genotype to 0. A better way might be to pre-filter those genotype just like what we've done with the LD matrix. Which one is better?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.