Code Monkey home page Code Monkey logo

forge's Introduction

For details of the analysis approach see documentation in the web version at

http://www.1000genomes.org/forge-analysis

1. The script itself currently called forge.pl written in perl. It has
the following perl dependencies.

use 5.012;
use warnings;
use DBI;
use Sort::Naturally;
use Cwd;
use Storable;
use Getopt::Long;

2. The sqlite3 db file that stores the bitstrings. This is a 98Gb file for forge2.0
Called forge.db.

3. Four stored hashes containing the parameters for the background selection.
two files each for either the omni genotyping array, or all the other GWAS
snp arrays.
76M omni.snp_bins.10
231B omni.snp_params.10
63M snp_bins.10
231B snp_params.10

Both the database and the hashes are downloadable from


ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/browser/forge_11/
forge_db_20141112.tar.gz for forge1.1

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/browser/forge/
forge_20131009.tar.gz for forge1.0

4. a forge.ini file in the same drectory as the script. Edit this to provide the
directory in which the dartabase and hashes are stored.

5. An R 3.0 installation with the "devtools" and "rCharts" packages installed. See

https://github.com/ramnathv/rCharts. You will need to install the latest version e.g.

require(devtools)
install_github('rCharts', 'ramnathv', ref = "dev")

The input data is one of several options.

a. A list of rsids for SNPs
b. a pseudobed format of chr\tbeg\tend\rsid
c. bed format of locations
d. vcf2.

The analysis requires a minimum of 20 SNPs (this is not a strict limit but
operationally is best).

To work SNPs have to be in phase 1 of 1000 genomes. The script gives warnings
on SNPs not found.

It also warns for background sets that do not have the right number of SNPs
chosen, but this is really for information only.

It takes a series of command line options as follows

-f : the file to run on
-data : whether to analyse ENCODE (encode) or EpigenomeRoadmap (erc) data
-label : a name for the files that are generated and for the plot titles
       where there is a title.
- format : for the input data formats. If this is location data e.g.
         bed or tabix or some vcf lines, the rsid is obtained from
         the sqlite3 database.
-bkgd : whether the background selection should be from Omniarray SNPs
         (omni) or the a general set of GWAS typing arrays (gwas).

Some of these default as described in the perldoc.6. Minimally the command
line is

forge.pl -f rsidfile -label Some_label

which will by default run on Epigenome Roadmap data with the gwas
background

OUTPUT
======
there are 3 outputs generated

1. A pdf static chart, that would be good for download.
2. Two alternate d3 interactive charts.  the *dchart.html are the best
in terms of axis labelling but have some minor quirks
3. A Datatables table.
4. There's also a tsv file of the results.

forge's People

Contributors

iandunham avatar jherrero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.