Code Monkey home page Code Monkey logo

mesusie's People

Contributors

borangao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mesusie's Issues

a few questions about MESuSiE

Hi,
Thanks for the new software and congratulations on the new publication!
We plan to utilize MESuSie on multi-ancestry QTL analysis, and have a few questions.

  1. LD mismatch check
    We have individual level data for each ancestry and will generate QTL association and LD using the exact same genotype data. So there should be no issue of LD "mismatch" as the LD info are not derived from an outside ref panel. I am wondering if the LD mismatch check is necessary in this case.

  2. "Non-significant SNPs correlation with GWAS signals" in LD mismatch check
    This is still relevant to LD mismatch but more of a general question. In the tutorial, for LD mismatch indicators, I think we will have info for the z, z_std_diff and logLR from kriging_rss(). What about some recommendations for the indicator of "non-significant SNPs correlation (abs(r)>0.8) with GWAS signals". Is pval>0.05 as non-significant and pval<1e-06 as significant signal good? or pval>0.05 as non-significant and pval<5e-08 as significant signal? I guess there is no direct output for this indicator so we will need to derive this.

  3. more than 2 ancestries
    The examples on MESuSie seemed all to be on 2 ancestries. But I guess the software can be run on 3 ancestries as well. We have 3 ancestries (White/Black/Hispanic), and I am wondering if I can simply run MESuSie on 3 ancestries and I guess it will generate shared variants among 3 ancestries, shared variants between any subset of 2 ancestries, and one-ancestry specific variants. This is preferred as it would give more complete info per snp, but I don't know if a third ancestry may interfere with shared variant between 2 ancestries. So any advice on this is appreciated.

Again congratulations on your new publication and thanks a lot for your advice!

Best,
Yue

test gene result from 3 population MESuSiE run

Hi,
Thanks much for the software and timely reply! We have run the MESuSiE on our QTL data from 3 populations: NHW(non-Hispanic White), AA(African American) and Hispanic, with sample size of NHW >1000 and AA/Hisp each <200. Here are some questions that we hope you could give some advice on:

We did some extensive testing on one of our genes that we know containing strong QTL signals. Our default setting is ("AA","Hisp","NHW"), L=10. We found:

  1. the order of input pop groups affect the results. E.g our default ("AA","Hisp","NHW") gives 10 cs, while ("NHW","Hisp","AA") gives 8 cs.

  2. test on effect of L on cs number: 10 cs when L=10; 17 cs when L=20, with L20 max; 34 cs when L50, with L50 max. So number of cs didn't saturate even when L=50. Is this ok?

  3. compare original QTL vs MESuSiE configuration: The MESuSiE configuration seems to be different from what I would expect from original QTL:

e.g: L2 contains single snp 552 (rs637571), configured as AA_Hisp(pip_config:6.810229e-01). The Z of snp 552: NHW Z=11.64;Hisp Z=4.97; AA Z=0.757. I would have expected that AA_Hisp would not be strong as it contains AA which is not significant QTL. The all config for snp 552:
AA Hisp NHW AA_Hisp AA_NHW
1.142324e-113 6.823896e-02 1.848905e-96 6.810229e-01 2.554795e-98
Hisp_NHW AA_Hisp_NHW
2.283364e-02 2.279045e-01

e.g: L5 contains single snp 509 (rs667555), configured as Hisp(pip_config:5.410980e-01). The Z of snp 509: Z=15.50; Hisp Z=5.7; AA=1.2. I would have expected that the snp is not Hisp specific as NHW is quite strong QTL. The all config for snp 509:
AA Hisp NHW AA_Hisp AA_NHW
1.257299e-113 5.410980e-01 2.995648e-79 2.089015e-01 4.555976e-81
Hisp_NHW AA_Hisp_NHW
1.803665e-01 6.963401e-02

  1. lead QTL in MESuSiE cs when L changes: the 4 top QTL in the gene are in high LD and have pval<1e-100 in NHW, with lead QTL Z>25.
    When L=5 or L=10, the lead QTL is the single-snp cs L1, with config "AA_Hisp_NHW".
    When L=20 or L=50, the 4 top QTLs are not in cs, with pip near 0 for all 4. For L=20, the "AA_Hisp_NHW" cs contain no snp in strong LD with lead QTL; for L=50, the "AA_Hisp_NHW" cs contain snp in LD (r=0.875) with lead QTL.

So overall it seems that L=5 or L=10 gives better result in terms of keeping lead QTL signals.

  1. test on two pop: we tested on ("AA","NHW"), ("AA","Hisp") and ("Hisp","NHW").
    The ("AA","NHW") gives 2 cs when L=10.
    The (Hisp","NHW") gives 10 cs when L=10; 15 cs when L=20, with L18 max.
    The ("AA","Hisp") gives 8 cs when L=10, with L10 max; 16 cs when L20, with L20 max; 24 cs when L=50, with L48 max.

So in run with two races, we see stable cs in ("AA","NHW"), but perhaps not ("AA","Hisp") and ("Hisp","NHW").

  1. pip_config sum >1: Not in this test gene, but in other genes, we see cases where some snp has the sum of all pip_config more than 1.
    e.g:
    AA Hisp NHW AA_Hisp AA_NHW Hisp_NHW
    0.0000000 0.5004338 0.0000000 0.7353165 0.0000000 0.1798680
    AA_Hisp_NHW
    0.2646835
    For this snp, the QTL is strong in all pop: NHW Z=22.8; Hisp Z=8.3; AA Z=8.4.
    Can the sum of all pip_config be more than 1, as in the example the sum is 1.68?

  2. strong effect of ancestry_weight on results: we test how the change of ancestry_weight affect the results. We change the ancestry_weight to ancestry_weight=rep(1/7,7) from default. The results are quite different. The new result contains 10 cs with all categorized as "AA_Hisp_NHW" except for two "AA_Hisp", while the default result contains 4 "Hisp", 4 "AA_Hisp", 1 "AA_Hisp_NHW" and 1 "Hisp_NHW". I didn't expect that the prior change would affect the result this much.

The following are not questions per se but just want to confirm the results are ok:

  1. snps in more than one cs: there are snps in more than one cs,i.e overlap between cs. e.g: L4(snp 483 486 492) is contained by L9 (snp 467 483 486 492).
  2. max_iter on result: We found that max_iter=500 when L=10 gives different result (in terms of number of snps with pip>0.5) from default max_iter=100. When we increase L, the max_iter=500 result is different from max_iter=100. When we set L=5, the max_iter=500 result seems to be same as max_iter=100.

Sorry for the lengthy questions. Thanks a lot for your time and advice on this. I attached a RDat for the test gene (test_gene.MESuSiE_res.RDat.zip), which contains LD_list, summ_stat_list and MESuSiE_res when L=10. We would really like to work out the potential problems. For now, it seems that the two pop of ("AA","NHW") gives the stable result.

test_gene.MESuSiE_res.RDat.zip

Best,
Yue

would MAF difference in different ancestries affect the MESuSiE config?

Hi,
We have finished the preliminary run of MESuSiE, and as expected, most of the high config_pip snps belonged to shared ancestry config. For the ancestry specific config, we would hope to understand more. In particular, I am wondering if the MAF difference in ancestries would have a large effect on MESuSiE config.

E.g:
A biological casual snp X is present in both race A and B with same effect, i.e. the true config is shared "A_B". However, X has a large MAF difference in A and B; let's say MAF is 0.06 in A and is 0.48 in B. Then even with the same sample size, the association would give equal effect size but much larger Z-score in B than in A. In this case, would MESuSiE possibly categorize the X as specific "B" instead of shared "A_B"? Let's suppose we only consider common alleles (MAF>0.05), should we check the MAF difference when determining ancestry specific snps?

Your advice is greatly appreciated. Thanks!

Best,
Yue

Error while running MESuSiE with 4 ancestries

Hello,
Great work with the software!

We tried running fine-mapping with MESuSiE using 4 different ancestries, and created the summ_stat and LD lists as shown below:

summ_stat_listB<- list("A"=A_example, "B"=B_example, "C"=C_example, "D"=D_example)
LD_listB<- list("A"=A_cov, "B"=B_cov, "C"=C_cov, "D"=D_cov)

We however encountered the following error when we ran the MESuSiE function as follows:

MESuSiE_res <- meSuSie_core(LD_listB, summ_stat_listB, L = 10, residual_variance = NULL, prior_weights = NULL, ancestry_weight = NULL)


Start data processing for sufficient statistics

Error in diag(sqrt(XtX.diag[[x]])) %*% R[[x]] : non-conformable arguments

What would you suggest as the best way to work around this? Do we need to indicate or change the default number of ancestries somewhere in the function?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.