Code Monkey home page Code Monkey logo

pgsql_genomics's Introduction

=== Efficient Storage of Genotypic Data in Relational Databases for Rapid Retrieval ===

I. COMPATIBILITY
This software is compatible with PostgreSQL Releases 9.4, 9.5, and 9.6 on GNU/Linux x86_64. It has been tested on CentOS 6 and CentOS 7. Older versions of PostgreSQL will not work because of usage of 'WITH ORDINALITY', and newer versions will not work because of incompatibility with the current state of the TINYINT library. We hope to either remove the dependency or create a fix in the future. Other architectures and GNU/Linux variants may work but are untested.

II. USAGE
On a machine with a complete PostgreSQL 9.4, 9.5, or 9.6 installation, run the script 'setup.sh' in the top-level directory. You should have both the PostgreSQL server and the PostgreSQL client tools installed. Note that this script must be run as root because it copies files into privileged locations and uses sudo to execute psql as the database superuser.

III. BENCHMARKING NOTES
This distribution generates fake data that is roughly designed to represent the test data in the paper 'Efficient Storage of Genotypic Data in Relational Databases for Rapid Retrieval'. The paper did *NOT* use fake data for its benchmarking. It used a real cohort of 3104 individuals and their unimputed and imputed genotypes, but privacy protections for the cohort and data volume require us not to release this data. Notably, the fake data generation uses a biased random number generation procedure to roughly represent the homogeneity that results from low alternate allele frequencies in variants. The fake sample-major data and variant-major data is *NOT* the same to facilitate fast loading for test purposes. The real benchmark data in the paper for sample-major and variant-major representations *WAS* the same.

IV. OTHER NOTES
The SQL code and C code here is very similar to but not identical to the code used in the paper. This is because the test platform was part of a much broader system incorporating many additional tables and types of data. Additionally, that system included a wide variety of extremely complex, optimized queries making use of CTEs, window functions, dynamic SQL, and other features of PostgreSQL to achieve extremely high-performance. If you are interested in obtaining some of this additional code for use with a built-out version of this code, please contact the authors. We strove to sanitize and minimize this release so that it was as simple as possible.

V. RECIPES
As requests for support on specific configurations arrive and as time permits, we will create precise recipes for proper configuration in the folder 'recipes'. These recipes will show the proper procedure assuming a minimal install of the listed operating system to make use of the software.

pgsql_genomics's People

Contributors

rlichtenwalter avatar

Stargazers

Hades avatar Phil Appleby avatar Tim Parker avatar Vasileios Athanasios Anagnostopoulos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.