Code Monkey home page Code Monkey logo

ldshrink's Introduction

ldshrink

a one-stop R package for shrinkage estimation of linkage disequilibrium

install.packages("devtools")
devtools::install_github("stephenslab/ldshrink")

NOTE: This package is still under very active development.

ldshrink's People

Contributors

crerecombinase avatar pcarbo avatar xiangzhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ldshrink's Issues

Rcpp error: no matching function for call to 'R_useDynamicSymbols', shown in macOS 10.14 + R 3.5.1

I came across the following error in my mac laptop:

> devtools::load_all(".")

[omitted]

RcppExports.cpp:242:5: error: no matching function for call to 'R_useDynamicSymbols'
    R_useDynamicSymbols(dll, FALSE);
    ^~~~~~~~~~~~~~~~~~~
/Library/Frameworks/R.framework/Resources/include/R_ext/Rdynload.h:84:10: note: candidate function not viable: no known conversion from 'int' to 'Rboolean' for 2nd argument
Rboolean R_useDynamicSymbols(DllInfo *info, Rboolean value);
         ^
4 warnings and 1 error generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for packageldshrink* removing/private/var/folders/dm/39__1gf52_s_9xbqrfd_46lc0000gp/T/RtmpjSVD5F/devtools_install_60b7e017afc/ldshrinkError: Command failed (1)

Here is my laptop info:

# xiangzhu @ stanford in ~ [22:27:51]
$ system_profiler SPSoftwareDataType
Software:

    System Software Overview:

      System Version: macOS 10.14 (18A391)
      Kernel Version: Darwin 18.0.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: STA-C02V26FCHTD5
      User Name: Xiang Zhu (xiangzhu)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 5 days 22:05

Here is my R session info:

> devtools::session_info()
Session info --------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.456)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2018-10-04                  

Packages ------------------------------------------------------------------------------------------------------------------------------------------
 package      * version   date       source        
 base         * 3.5.1     2018-07-05 local         
 BH             1.66.0-1  2018-02-13 CRAN (R 3.5.0)
 commonmark     1.6       2018-09-30 CRAN (R 3.5.0)
 compiler       3.5.1     2018-07-05 local         
 datasets     * 3.5.1     2018-07-05 local         
 devtools       1.13.6    2018-06-27 CRAN (R 3.5.0)
 digest         0.6.17    2018-09-12 CRAN (R 3.5.1)
 graphics     * 3.5.1     2018-07-05 local         
 grDevices    * 3.5.1     2018-07-05 local         
 grid           3.5.1     2018-07-05 local         
 lattice        0.20-35   2017-03-25 CRAN (R 3.5.1)
 magrittr       1.5       2014-11-22 CRAN (R 3.5.0)
 Matrix         1.2-14    2018-04-13 CRAN (R 3.5.1)
 memoise        1.1.0     2017-04-21 CRAN (R 3.5.0)
 methods      * 3.5.1     2018-07-05 local         
 R6             2.3.0     2018-10-04 CRAN (R 3.5.1)
 Rcpp           0.12.19   2018-10-01 CRAN (R 3.5.0)
 RcppEigen      0.3.3.4.0 2018-02-07 CRAN (R 3.5.0)
 RcppParallel   4.4.1     2018-07-19 CRAN (R 3.5.0)
 RcppProgress   0.4.1     2018-05-11 CRAN (R 3.5.0)
 rlang          0.2.2     2018-08-16 CRAN (R 3.5.0)
 roxygen2       6.1.0     2018-07-27 CRAN (R 3.5.0)
 stats        * 3.5.1     2018-07-05 local         
 stringi        1.2.4     2018-07-20 CRAN (R 3.5.0)
 stringr        1.3.1     2018-05-10 CRAN (R 3.5.0)
 testthat       2.0.0     2017-12-13 CRAN (R 3.5.0)
 tools          3.5.1     2018-07-05 local         
 utils        * 3.5.1     2018-07-05 local         
 withr          2.1.2     2018-03-15 CRAN (R 3.5.0)
 xml2           1.2.0     2018-01-24 CRAN (R 3.5.0)
 yaml           2.2.0     2018-07-25 CRAN (R 3.5.0)

L2-penalty LD shrinkage

I wonder how hard to implement the following L2-penalty (i.e. "ridge") LD shrinkage estimator: Sigma_ridge = Sigma + lambda I, where Sigma is the sample covariance matrix for SNPs.

For simplicity we can fix the value of lambda (e.g. lambda=0.01) for now. (It might be conceptually straightforward to tune this parameter by cross-validation, but this definitely complicates the software implementation.)

This approach was used in ImpG-Summary: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4184260/

screen shot 2018-08-06 at 9 25 12 pm

Including external data

@pcarbo Hi Peter -- I just had a great meeting with Nick today, and we have the following question about including published external data in LDshrink package. We would appreciate your input.

To make the package user-friendly, we would like to include the following external data files:

  1. Genetic maps from 1000 Genomes: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates/

  2. Approximately independent LD blocks: https://bitbucket.org/nygcresearch/ldetect-data/src

These data files have fairly simple structure. The genetic maps are data frames with the following headers:

[id] [physical position] [genetic position (cumulative)]

The LD blocks are data frames with the following headers:

[chromosome name] [region start] [region stop]

These data files also have formal publications available. I wonder if we could include these external files in LDshrink package, provided that we explicitly cite these publications?

Do we have to worry about licensing issues?

An alternative plan is to provide users with the data preprocessing scripts/functions, but I think this will make the package harder to use.

External sources of genetic maps

Below are links to several genetic maps that I have used in my past work. After discussing with Peter (see Issue #3 ), it seems fine to include these external data in LDshrink as long as we properly cite these data. To make LDshrink a one-stop package for LD-related calculation, I think we should include all these files in the package (if possible).

Genetic maps listed in IMPUTE v1

Genetic maps created by Pickrell

Note that these maps have very straightforward format:

$ head genetic_map_CEU_chr16.txt
position CEU_rate(cM/Mb) Genetic_Map(cM)
24045 0 0
24170 0.3020482 0
25057 0.3020482 0.0002679168
25065 0.2846052 0.0002703331
25561 0.2782453 0.0004114973
25658 0.2782453 0.0004384871
28165 0.2782453 0.001136048
29475 0.2794324 0.001500549
31010 0.2794324 0.001929478

Return SNP info along with LD matrix

It seems the main function only returns an estimated LD matrix at this point?
https://github.com/stephenslab/LDshrink/blob/32b4ad3942f7cb429f23c529b86ab72cfbb1b257/R/LDshrink.R#L6

Ideally we want to have some basic SNP info available (e.g. position, allele), which is essential in combining LD with GWAS summary statistics in analyses.

I think the emeraLD package gives us a good example: https://github.com/statgen/emeraLD

> source('emeraLD2R.r');
Loading required package: data.table
data.table 1.11.4  Latest news: http://r-datatable.com
emeraLD v0.1 (c) 2018 corbin quick (corbinq@gmail.com)

reading from m3vcf file...

processed genotype data for 5008 haplotypes...

calculating LD for 60 SNPs...

done!! thanks for using emeraLD

> names(ld_data)
[1] "Sigma" "info"

> head(ld_data$info)
   chr   pos          id ref alt
1:  20 83061 rs549711487   C   T
2:  20 83196  rs62190472   A   T
3:  20 83252   rs6137896   G   C
4:  20 83570   rs6048967   T   G
5:  20 83611 rs114000219   C   A
6:  20 83792 rs529518485   A   G

> head(ld_data$Sigma[, 1:5], 5)
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,]  1.00000 -0.00602  0.03989 -0.00824 -0.00331
[2,] -0.00602  1.00000 -0.14013 -0.03102 -0.01245
[3,]  0.03989 -0.14013  1.00000 -0.05714 -0.04400
[4,] -0.00824 -0.03102 -0.05714  1.00000 -0.01704
[5,] -0.00331 -0.01245 -0.04400 -0.01704  1.00000

Option to output traditional LD statistics

Although the main purpose of LDshrink is to provide a one-stop place for various LD shrinkage estimators that have been recently used in GWAS summary statistics literature, I think perhaps we should also provide the option to output the traditional LD statistics in population genetics literature.

  • For phased haplotypes: r2, D, and D’ statistics
  • For unphased genotypes: sample correlation or r2

Below are two potential use cases for this option.

  • Use case 1: a data analyst doesn't need or care LD shrinkage estimator, and just need standard LD statistics based on a dataset.
  • Use case 2: a method developer wants to assess the improvement of using LD shrinkage estimator over using standard LD estimator.

This issue may be redundant since Nick has already provided the option useLDshrink = TRUE.

How to use this package?

Hi Xiang,
I have read your AOAS paper, and I think ldshrink package is very helpful to me. But when I download the package, it is hard for me to find the main function to calculate the shinkage LD structure. Would you like to give me a toy example to let me use the package?
Many thanks for your kindly help.

                                              Sheng

Segmentation Fault when merging ldm files with GCTB

Hello,
I just started using the GCTB software but am running into an issue when I try to merge sparse ldm files from a single chromsome into one ldm file. I get the following error:
/var/spool/slurmd.cn1605/job55094972/slurm_script: line 13: 82612 Segmentation fault

Here's the basics of the code I ran
gctb --mldm chr1.mldmlist --make-sparse-ldm --out chr1 --thread 16

Thank you,
Alexis

Possible ways of exploiting the structure of LD matrix

Here we simply call R built-in function eigen().

https://github.com/stephenslab/LDshrink/blob/32b4ad3942f7cb429f23c529b86ab72cfbb1b257/R/block_utils.R#L118

According to R documentation, it seems that this call does not exploit the banded structure of Wen-Stephens estimator:

Source
eigen uses the LAPACK routines DSYEVR, DGEEV, ZHEEV and ZGEEV.
LAPACK is from http://www.netlib.org/lapack and its guide is listed in the references.

Nick also has a version for sparse LD matrix, which is not included here:

I also have a routine that computes and stores a sparse symmetric matrix for the whole chromosome
it's faster than the dense block-wise routine

Confirm that EVs are actually real numbers

During my recent whole genome simulations I found some tailing EVs of LD matrices are (numerically) complex numbers, e.g. 3.28+1e-14 i.

This seems a numerical issue since the magnitude of imaginary part is so close to zero.

However, Nick haven't noticed this pattern in his whole genome data analyses.

LD shrinkage estimation in GCTB

It seems that GCTB software has implemented Wen-Stephens shrinkage LD estimator (see also https://github.com/stephenslab/rss-private/issues/64):

Shrunk LD matrix
The shrinkage estimator for the LD correlation matrix was originally proposed by Wen and Stephens (2010). The estimator shrinks the off-diagonal entries of the sample LD matrix toward zero. Zhu and Stephens (2017) used the shrinkage estimator in their Regression with Summary Statistics (RSS) methodology and showed empirically that it can provide improved inference. The shrinkage estimator overcomes some of the issues arising from approximating the full LD matrix in the summary-data based model using a subset of the full LD matrix and constructing the matrix from a reference. The GCTB implementation is a C++ port from that provided with the RSS software and has been adapted for use with the GCTB software.

GCTB software doc page: http://cnsgenomics.com/software/gctb/#SummaryBayesianAlphabet

Revise object documentation based on Wickham book

Below is a check list of guidelines on files in man/ from Wickham's package book: http://r-pkgs.had.co.nz/man.html

Even as a total newbie to R package development, I know I should not manually edit files in man/ folder most of time ... Hence, this check list should be used together with #22.

Instead of writing these files by hand, we’re going to use roxygen2 which turns specially formatted comments into .Rd files.

  • Follow the documentation workflow: 1-2-3-4 or 1-2-3-4
  • Roxygen comments start with #' and come before a function.
  • Each line should be wrapped in the same way as your code, normally at 80 characters.
  • Because @ has a special meaning in roxygen, you need to write @@ if you want to add a literal @ to the documentation.
  • The first sentence becomes the title of the documentation.
  • The second paragraph is the description: this comes first in the documentation and should briefly describe what the function does.
  • The third and subsequent paragraphs go into the details: this is a (often long) section that is shown after the argument description and should go into detail about how the function works.
  • All objects must have a title and description. Details are optional.
  • You can add arbitrary sections to the documentation with the @section tag. This is a useful way of breaking a long details section into multiple chunks with useful headings. Section titles should be in sentence case, must be followed by a colon, and they can only be one line long.
  • @seealso allows you to point to other useful resources, either on the web, \url{http://www.r-project.org}, in your package \code{\link{functioname}}, or another package \code{\link[packagename]{functioname}}.
  • If you have a family of related functions where every function should link to every other function in the family, use @family. The value of @family should be plural.
  • @aliases alias1 alias2 ... adds additional aliases to the topic. An alias is another name for the topic that can be used with ?.
  • @param name description describes the function’s inputs or parameters.
  • The description should start with a capital letter and end with a full stop. It can span multiple lines (or even paragraphs) if necessary. All parameters must be documented.
  • Can document multiple arguments in one place: @param x,y Numeric vectors..
  • @examples provides executable R code showing how to use the function in practice.
  • Example code must work without errors as it is run automatically as part of R CMD check.
  • \dontrun{} allows you to include code in the example that is not run.
  • Can put them in separate files and use @example path/relative/to/package/root to insert them into the documentation. (Note that the @example tag here has no ‘s’.)
  • @return description describes the output from the function.
  • Indent the second and subsequent lines of a tag so that when scanning the documentation it’s easy to see where one tag ends and the next begins.
  • Can use roxygen to provide a help page for your package as a whole.
  • Document S3, S4 and RC.
  • Special characters: @@ for @, %% for % and \\ for \.
  • There is a tension between the DRY (don’t repeat yourself) principle of programming and the need for documentation to be self-contained.
  • Inherit parameter descriptions from other functions using @inheritParams source_function and or @inheritParams package::function.
  • Document multiple functions in the same file by using either @rdname or @describeIn.
  • Within roxygen tags, you use .Rd syntax to format text: https://cran.r-project.org/doc/manuals/R-exts.html#Marking-text

Use `data.table` package to parse vcf

Currently ldshrink assumes the input genotype/haplotype data are stored in an n-by-p numerical matrix, which is convenient from statisticians' perspective. However, public genotype/haplotype data from 1000 Genomes are stored in vcf format.

In the past I first used vcftools to convert vcf data to IMPUTE2 format (which is indeed a p-by-n matrix), and then transpose IMPUTE2-formatted data in R. See https://github.com/stephenslab/rss/blob/master/misc/import_1000g_vcf.sh.

This two-step workflow is not so convenient (at least for statisticians): they have to learn a new program like vcftools before any LD-related operations in ldshrink.

It seems that now we can use data.table (https://cran.r-project.org/web/packages/data.table) to directly convert vcf data to the n-by-p matrix in R. Here is an example: https://gist.github.com/cfljam/bc762f1d7b412df594ebc4219bac2d2b.

Here is my own example.

> suppressPackageStartupMessages(library(data.table))
> my_dt <- data.table::fread(cmd="zcat B_CELL_NAIVE.vcf.gz")

|--------------------------------------------------|
|==================================================|
>
> dim(my_dt)
[1] 215708754         8
> head(my_dt)
   #CHROM       POS          ID REF ALT QUAL FILTER
1:  chr15 101094127  rs12904576   T   C    .   PASS
2:  chr15 101114640  rs36053285   T   C    .   PASS
3:  chr15  45685616 rs796721871  TA   T    .   PASS
4:  chr15  45670316   rs2114501   G   A    .   PASS
5:  chr15  45618730  rs59889118   G   A    .   PASS
6:  chr15  45620612   rs1980288   T   C    .   PASS
                                                                                                               INFO
1:   Gene=ENSG00000270127.1;GeneSymbol=ENSG00000270127.1;Pvalue=3.714e-27;Beta=-1.10;Statistic=-15.72;FDR=2.681e-20
2:   Gene=ENSG00000270127.1;GeneSymbol=ENSG00000270127.1;Pvalue=2.649e-24;Beta=-1.09;Statistic=-14.16;FDR=3.412e-18
3: Gene=ENSG00000171766.11;GeneSymbol=ENSG00000171766.11;Pvalue=2.614e-23;Beta=-1.10;Statistic=-13.63;FDR=3.412e-18
4:  Gene=ENSG00000171766.11;GeneSymbol=ENSG00000171766.11;Pvalue=3.007e-23;Beta=-1.09;Statistic=-13.6;FDR=3.412e-18
5:  Gene=ENSG00000171766.11;GeneSymbol=ENSG00000171766.11;Pvalue=3.64e-23;Beta=-1.09;Statistic=-13.56;FDR=3.412e-18
6:  Gene=ENSG00000171766.11;GeneSymbol=ENSG00000171766.11;Pvalue=3.64e-23;Beta=-1.09;Statistic=-13.56;FDR=3.412e-18

The benefit of using data.table here is two-fold: i) users don't have to leave R and use vcftools to get n-by-p genotype matrix from vcf data; ii) data.table is a well-maintained and constantly-upgraded package that can handle large datasets efficiently (at least based on my past experiences).

Hence, we can either add a wrapper that uses data.table to parse vcf for ldshrink users, or at minimum, we can simply provide a vignette showing how to use data.table to parse vcf.

Finally there exists a package vcfR (https://cran.r-project.org/web/packages/vcfR) that might be relevant (but I have not used it much).

C++17 requirement is limiting

@CreRecombinase @xiangzhu Can you find a way to use replace the C++17 requirement (in the Makevars file) with the C++11 requirement, for example? The C++17 standard is much less widely supported; compare this and this. In the interest of making the LDshrink package as accessible as possible, can you try to find a way to avoid using C++17 features?

Thanks to @jean997 for identifying this issue.

Revise package metadata based on Wickham book

Below is a check list of guidelines on the DESCRIPTION file from Wickham's package book: http://r-pkgs.had.co.nz/description.html

  • Every package must have a DESCRIPTION.
  • DESCRIPTION uses a simple file format called DCF, the Debian control format.
  • Each line consists of a field name and a value, separated by a colon.
  • When values span multiple lines, they need to be indented.
  • Both Imports and Suggests take a comma separated list of package names.
  • Putting one package on each line, and keeping them in alphabetical order. That makes it easy to skim.
  • Imports: packages listed here must be present for your package to work. In fact, any time your package is installed, those packages will, if not already present, be installed on your computer.
  • Explicitly refer to external functions using the syntax package::function().
  • If you use a lot of functions from other packages this is rather verbose. There’s also a minor performance penalty associated with :: (on the order of 5µs, so it will only matter if you call the function millions of times).
  • Suggests: your package can use these packages, but doesn’t require them. You might use suggested packages for example datasets, to run tests, build vignettes, or maybe there’s only one function that needs the package.
  • Packages listed in Suggests are not automatically installed along with your package. This means that you need to check if the package is available before using it (use requireNamespace(x, quietly = TRUE)).
  • The easiest way to add Imports and Suggests to your package is to use devtools::use_package().
  • If you need a specific version of a package, specify it in parentheses after the package name: MASS (>= 7.3.0).
  • Generally, it’s always better to specify the version and to be conservative about which version to require. Unless you know otherwise, always require a version greater than or equal to the version you’re currently using.
  • You should almost always use Imports, not Depends.
  • You can also use Depends to require a specific version of R, e.g. Depends: R (>= 3.0.1).
  • LinkingTo: packages listed here rely on C or C++ code in another package.
  • You can also list things that your package needs outside of R in the SystemRequirements field.
  • Title is a one line description of the package, and is often shown in package listing. It should be plain text (no markup), capitalised like a title, and NOT end in a period. Keep it short: listings will often truncate the title to 65 characters.
  • Description is more detailed than the title. You can use multiple sentences but you are limited to one paragraph. If your description spans multiple lines (and it should!), each line must be no more than 80 characters wide. Indent subsequent lines with 4 spaces.
  • Include a README.md file that goes into much more depth and shows a few examples.
  • To identify the package’s author, and whom to contact if something goes wrong, use the Authors@R field.
  • If your package includes code that you didn’t write, you need to make sure you’re in compliance with its license.
  • If you want to release your package to CRAN, you must pick a standard license.
  • Formally, an R package version is a sequence of at least two integers separated by either . or -. For example, 1.0 and 0.9.1-10 are valid versions.
  • R uses version numbers to determine whether package dependencies are satisfied.
  • A released version number consists of three numbers, <major>.<minor>.<patch>.
  • An in-development package has a fourth component: the development version. This should start at 9000. For example, the first version of the package should be 0.0.0.9000.
  • Increment the development version, e.g. from 9000 to 9001 if you’ve added an important feature that another development package needs to depend on.
  • Collate controls the order in which R files are sourced. This only matters if your code has side-effects; most commonly because you’re using S4.
  • LazyData makes it easier to access data in your package. Because it’s so important, it’s included in the minimal description created by devtools.

Warnings from package installation

In my version of the clang compiler (version 4.0.1), I get several dozen warnings, which are attached. Some of these are in the Eigen source, but others are in our own code, and perhaps some of these can be easily addressed. I've labeled this as a "bug" but perhaps should be read as "potential bug".

ldshrink.install.err.gz

2018-06-14 meeting agenda

Below is a draft of meeting agenda.

  • Time: 2018-06-14, 10 am PST
  • Location: Skype
  • Attendees: Nick, Xiang

Review items

  • What is the current state of LDshrink development?
  • Can current LDshrink reliably produce results for one chromosome (i.e. Xiang's use case)?
  • Can current LDshrink reliably produce results for whole genome (i.e. Nick's use case)?
  • How far is the current LDshrink from an accepted Bioconductor package?

Action items

  • Freeze a set of features for the first public release.
  • Complete LDshrink package development so that it can be accepted by Bioconductor.
  • Draft a short manuscript on the LDshrink package (closely related to package vignettes).
  • Compare with related work: https://doi.org/10.1101/301366.
  • Split work by category (one for coding-related work, the other for writing-related work).
  • Set a clear time frame, to avoid Xiang's research "down time" when he has to teach.

Future items

  • Find public domains to share precomputed LD data for other projects (e.g. I vaguely remember that mashr revision used Nick's precomputed LD data on Midway).
  • Communicate with others (e.g. Matthew, Xin, Peter) once the package is near mature.

Revise vignettes based on Wickham book

Below is a check list of guidelines on files in vignette / from Wickham's package book: http://r-pkgs.had.co.nz/vignettes.html.

This part seems straightforward since the main tools are rmarkdown and knitr, with which I have some experiences. However, preparing and organizing the contents of vignettes may be as hard as writing articles.

  • Start with devtools::use_vignette("my-vignette").
  • Can build all vignettes from the console with devtools::build_vignettes(), but this is rarely useful. Instead use devtools::build() to create a package bundle with the vignettes included.
  • Need to put yourself in the readers’ shoes, and adopt a “beginner’s mind”.
  • Any packages used by the vignette must be declared in the DESCRIPTION.
  • Need to watch the file size.

Interface to export LD matrix for requested region

Here is a feature request for an export interface that for given chr, start and end positions, a complete matrix (or sparse matrix in R which can be trivially made complete) from given blocked LD database file can be extracted.

Revise R codes based on Wickham book

Below is a check list of guidelines on files in R/ from Wickham's package book: http://r-pkgs.had.co.nz/r.html

  • Two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file.
  • File names should be meaningful and end in .R.
  • Avoid capitalisation problems by never using filenames that differ only in capitalisation.
  • You don’t have to use my style, but I strongly recommend that you use a consistent style and you document it.
  • If you’re working on someone else’s code, don’t impose your own style. Instead, read their style documentation and follow it as closely as possible.
  • Variable and function names should be lowercase. Use an underscore _ to separate words within a name.
  • Generally, variable names should be nouns and function names should be verbs.
  • Where possible, avoid using names of existing functions and variables.
  • Format suggestions on spacing , curly braces, line lengths, assignment, and commenting.
  • Never run code at the top-level of a package: package code should only create objects, mostly functions.
  • Don’t use library() or require(). Use the DESCRIPTION to specify your package’s requirements.
  • Never use source() to load code from a file. Rely on devtools::load_all() which automatically sources all files in R/.
  • Clean up after yourself with on.exit(): options(), par() and setwd()
  • Isolate creating plots and printing output to the console in functions that only produce output.
  • Use .onAttach() for startup message.
  • To display startup messages, always use packageStartupMessage(), and not message().
  • To set custom options for your package with options().
  • If you use .onLoad(), consider using .onUnload() to clean up any side effects.
  • If you’re planning on submitting your package to CRAN, you must use only ASCII characters in your .R files.
  • The easiest way to use the special unicode escape "\u1234" format is to use stringi::stri_escape_unicode().

MHC region exclusion

We often exclude MHC region when estimating LD and performing further downstream analyses.

It would be nice to do this MHC exclusion for people.

The idea is to have the following simple data frame inside ldshrink, and a flag like remove_mhc=TRUE:

> HLA.hg19
    chrom start.base end.base
HLA     6   29719561 32883508
> is(HLA.hg19)
[1] "data.frame" "list"       "oldClass"   "vector"

Similarly we may want to remove centromeres as well.

Both MHC and centromeres data frames can be found here: https://github.com/smgogarten/GWASTools/tree/master/data.

This issue is also related to Issue #14, since they share the same idea: if we cannot estimate something reliably, then probably better no to estimate them at all.

Another very detailed note: pay attention to the sources of these regions -- e.g. if these regions are pulled out from UCSC, then they are 0-based.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.