Code Monkey home page Code Monkey logo

scca-acat_twas's People

Contributors

fenghelian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sudhanshu1610

scca-acat_twas's Issues

Interpreting the results

I sent Helian an email to ask some questions about how to interpret the ACAT p-values resulting from compute_acatP.r script, and I am sharing here her answers. Helian, thanks once more for helping me sort out my code and for helping me interpret the results!!

1. How do I interpret the ACAT p-value (generated through analysis of multiple single tissues + the sCCA models), considering the results I obtained from multiple single tissues?
Similar to multiple testing, a gene identified as significant by ACAT (p<alpha level) indicates that the gene is significantly associated with the trait in at least one tissue used in the analysis. There is no test for which tissue it is significant in, however, the single tissue test results could serve as an indicator. If the signal is mostly from sCCA weights, then it is mostly an across tissue signal.

[For further information, check the published manuscript.]

2. For example, if gene X has an acat P value of 10^-9, driven by sCCA1 alone, does this mean that we were able to identify gene X as a risk gene, but we cannot identify the specific tissue involved, likely because the single tissue weights are underpowered to detect an effect?
Yes, you may phrase it as this.

3. Can I include the TWAS results calculated from sCCA weights when I am plotting the multi-tissue TWAS results with FUSION?
Yes, you may. Below is the supplementary figure 6 in our paper, we plotted the zscore for the single tissue test on the y axis and zscore from sCCA1 results on the x axis, the red line is y=x diagonal line. We can see it is expected for sCCA TWAS results to have a stronger effect.
image

4. I don't know what P val cut-off I could use [for the acat P values].
So, the general principle here is that we wanted to adjust for the number of test we have conducted to control the type I error rate at desired level. For example, if we want to control type I error rate at 0.05. When you are doing single tissue test in the standard FUSION pipeline, you should adjust for the total number of genes been tested. For example, if you conducted tests for Whole blood tissue (4,701 features in total), Peripheral Blood (2,454), the valid threshold you should use is 0.05/(4701+2454). And the multiple adjustment burden arises as you increase more tissues in the analysis, because you have to adjust for the same gene multiple times as it has been tested in multiple tissues. However, when you are doing the multi-tissue test with sCCA+ACAT, the ACAT combined test has already adjusted for the number of tissues you have tested for. So the threshold is just p divided by the number of unique genes you have tested for (number of rows in the ACAT results).

5. FUSION performs the conditional analysis in the second step, in their traditional pipeline. Is there a need/way for doing a similar conditional analysis with the genes identified by the sCCA-ACAT TWAS?
You can still perform conditional analysis for the significant genes. However, the conditional analysis is performed in one tissue weight at a time. So you got to pick one tissue for the conditional analysis. You can use the sCCA test results as well, but not on the ACAT combined results.

6. Should I apply a bonferroni correction to the ACAT P-value, considering the number of rows in that resulting file?
See response for question 4

7. Why are there 3 sCCA models/weights files?
When computing sCCA weights, we require the gene to be heritable in the gene expression model to control for False discovery rate. It is observed that most of the gene expression signal is captured by the top 3 canonical vectors, thus we include gene expression models for up to 3 heritable canonical vectors.

Lastly, if you are suffer from low power for your TWAS analysis, another approach you can try out is to conduct multi-phenotype analysis if you have GWAS results for multiple phenotypes that is genetically correlated, such as multiple cancer types or lipids traits. See our EpiGenetics paper for more details.

Error: Each row of output must be identified by a unique combination of keys

Dear @fenghelian, thanks to you and your group for developing this tool, I am really keen to apply it to a trait I am interested!

I am trying to run an sCCA-ACAT TWAS following your tutorial, but I am unable to run compute_acatP.r.

I noticed a few bugs:

  1. the script wasn't loading optparse (because it wasn't recognizing the command make_option) and readr (because it wasn't recognizing read_delim), so I included library(optparse) and library(readr)
  2. there was an unnecessary comma after the options following the second make_option, which made optparse think there was gonna be a third argument input by the user. I removed that comma.

Even after rectifying these, I am getting the following error:

-- Column specification -------------------------------------------------------------------------------------------------------------
cols(
  .default = col_double(),
  PANEL = col_logical(),
  FILE = col_character(),
  ID = col_character(),
  BEST.GWAS.ID = col_character(),
  EQTL.ID = col_character(),
  MODEL = col_character()
)
i<U+00A0>Use `spec()` for the full column specifications.

[.... this repeats another four times - corresponding to the number of weights tested: 5 in total ("GTEx.Whole_Blood" "GTEx.Brain_Caudate_basal_ganglia" "sCCA1" "sCCA2" "sCCA3").... ]

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 1126 rows:
* 9, 46
* 12, 48
* 7, 43
* 22, 64
* 16, 50
* 17, 52
* 25, 82
* 26, 84
[....]
* 24, 80
* 27, 85
* 21, 59
* 3, 31
* 29, 92

Backtrace:
    x
 1. +-dat %>% spread(key = PANEL, value = TWAS.P, drop = T)
 2. +-tidyr::spread(., key = PANEL, value = TWAS.P, drop = T)
 3. \-tidyr:::spread.data.frame(., key = PANEL, value = TWAS.P, drop = T)
Execution halted

The full log can be found here.

Could you help me, please?

[Please see how I got here, and my sessionInfo() below]

ACAT output: gene ID annotated with highly significant associations to multiple single-tissue - but not found in output from single-tissue analysis.

Dear Helian
Thank you for your contribution to the community with this addition to TWA-studies.
I've found myself wondering about a bit confusing matter. When I order the ACAT results by lowest P-value, I get the following results:
Skærmbillede 2023-09-05 kl  10 43 23

However, in a dataframe where i have rbind'ed all single-tissues (including sCCA1-3), the gene ID with lowest P-value from ACAT results is non existent.
Skærmbillede 2023-09-05 kl  10 43 50

I have even tried to look up specific P-values from the ACAT results:
Skærmbillede 2023-09-05 kl  10 47 19

Have you encountered a similar problem before? And how can this possibly be explained, given the ACAT result for this gene-ID is annotated with P-values from the single-tissue analysis?

Best of wishes
Soren

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.