LDmergeFM

This script contains an implementation in R of a weighted average procedure to generate consensus locus-specific LD matrices from multiple single-cohort correlation files. Together with FINEMAP, it has been used in the generation of the PGC3-SCZ fine-mapping results.

Input files

LDmergeFM requires the readr, dplyr, purrr, reshape2 and Matrix packages to be available for your local R installation. It is designed to be run from the command-line as:

Rscript --vanilla LDmergeFM.R $LOCUS $COR_FORMAT $ESS_FORMULA

Where the argument $LOCUS is the locus identifier for the LD matrix being calculated, present in all of the input files:

Filename	Contents
`$LOCUS.ref`	Two-column whitespace-delimited file. Column 1: SNP name. Column 2: Effect allele. Equivalent to columns 1 and 4 of a `FINEMAP` .z file. No header.
`$COHORT_$LOCUS.fam`	1x `PLINK v1.07+` .fam file for each cohort being analysed (with case/control phenotype). Individuals with missing phenotypes not used to compute the pairwise correlations should be excluded from this file.
`$COHORT_$LOCUS.cor.gz`	1x `LDSTORE v1.1` .cor file (compressed output of –table flag) for each cohort being analysed. Output of the `PLINK v1.9+` –r inter-chr gz flag (`$COHORT_$LOCUS.ld.gz`) is also acceptable if the `$COR_FORMAT` argument is changed as described below.

Single-cohort names ($COHORT) should be unique but can contain any non-whitespace characters. The underscore ("_") separation with $LOCUS is mandatory.

Changing cohort weights and correlation file format

The other two arguments of the script are optional:

$COR_FORMAT indicates whether the input correlations have been computed with “LDSTORE” or “PLINK”, allowing the script to correctly process these files. Defaults to “LDSTORE” if not explicit.

$ESS_FORMULA indicates how to compute the effective sample size used as weight of each LD matrix. Options are “METAL” for the formula used in Willer et al. 2010 or “NCP” for the definition of Matti Pirinen and Vukcevic et al. 2011. Defaults to “METAL” if not explicit.

Note that if these last two arguments are used, they have to be used in the order above. This implies that to change $ESS_FORMULA one needs to be explicit and state the value of $COR_FORMAT as well (but the converse is not true).

Output files

Filename	Contents
`$LOCUS.ld`	Square consensus LD matrix. SNPs are given on the same order as `$LOCUS.ref`.
`$LOCUS.snps.log`	SNPs used in the computation of the consensus LD matrix. Should match those on `$LOCUS.ref`.
`$LOCUS.samples.log`	Cohorts used in the computation of the consensus LD matrix. Should match all of those provided as input files.
`$LOCUS.heatmap.png`	Basic illustration of the consensus LD structure at the locus. Intended for troubleshooting or to identify regions that could be problematic for fine-mapping. Only generated if R installation has PNG capability.

Testing

The ./test/ folder contains some input/output files that can be used to conduct a reproducible run. For illustration purposes, these files include the region around exon 12 of the EDAR gene, which contains some very strong linkage as previously discussed by Sabeti et al. 2007. Genotypes were derived from polymorphic SNPs from four subpopulations (Europeans, Sub-Saharan Africans, East Asians and Native Americans) of the public HGDP dataset. Please reference Bergström et al. 2020 if you find this data useful for other purposes.

Assumptions

LDmergeFM has been designed with fine-mapping in a meta-analytic case-control GWAS setting in mind, so one of its implicit requirements (in line with FINEMAP) is that the reference allele for the correlations is the same in all cohorts. Given that inconsistent criteria are currently used to decide effect/reference alleles, it can help to set these explictly using the PLINK –a1-allele/–ref-allele flag, which in fact can accept the format of the $LOCUS.ref file.

For a similar reason, LDmergeFM expects that all SNPs in the $LOCUS.ref file will be uniquely named and that each of them can be found in the correlation file of at least one cohort. Duplicated or missing SNPs might cause the script to fail silently, returning erroneous output, so please ensure these are not present.

Notes

LDmergeFM has not been tested with correlation table files from LDSTORE v2+, please conduct a test run before using those in important analyses.

LDmergeFM can work with an arbitrary number of input matrices but in its current state is not optimised to take advantage of multicore environments or matrix sparsity, and thus can be potentially resource-hungry. If working on systems with resource quotas, please check .log files to make sure the computation of the consensus LD matrix has used all available data.

LDmergeFM is not ancestry-aware. If ancestry-specific consensus matrices are needed (e.g. for trans-ancestry fine-mapping purposes) you should run the script separately for each group of single-ancestry inputs.

Major version history

2021-03-09 => Added some internal checks for better error reporting. Introduced arguments to accommodate other correlation file formats and change the calculation for effective sample size weights if desired. New basic heatmap output.

2020-11-13 => Upload of initial version with essential functionality.

Additional software

FINEMAP/LDSTORE: http://www.christianbenner.com/

PLINK: https://www.cog-genomics.org/plink/1.9/

Citation

If this script is helpful for your work, just reference the main PGC3-SCZ paper. If it ends up being very helpful, please let me know so I can keep fighting impostor syndrome one day at a time 😌.

Contact

Please submit suggestions and bug-reports at https://github.com/pintaius/LDmergeFM/issues.

pintaius / ldmergefm Goto Github PK

ldmergefm's Introduction

LDmergeFM

Input files

Changing cohort weights and correlation file format

Output files

Testing

Assumptions

Notes

Major version history

Additional software

Citation

Contact

ldmergefm's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent