rivas-lab / public-resources Goto Github PK

View Code? Open in Web Editor NEW

23.0 3.0 11.0 1.38 GB

Repository for resources we'd like to share with the community.

License: MIT License

R 0.18% Jupyter Notebook 99.22% Shell 0.03% Python 0.57% Makefile 0.01%

public-resources's Introduction

Rivas Lab Public Resources

This repository contains resources that the Rivas Lab has created to share with the community.

Cloning

Some of the resources are large and therefore tracked with git LFS. It's best to clone this repository using

git lfs clone --recursive https://github.com/rivas-lab/public-resources.git

If you don't use --recursive, after cloning, use

git submodule update --init --recursive

to pull in the submodules.

Directories

`gene_lists`

Useful lists of genes. See README in that directory for more info.

`uk_biobank`

`submodules`

Useful repos from other sources.

public-resources's People

Contributors

Stargazers

Watchers

Forkers

miaoranzhang mrrmrr333 zd-mei aminekhasteh meijian smsinks suraj-adewale eikematthias ph09 hmutanqilong shudanhua

public-resources's Issues

rivas_decomposition.py

I have followed all the steps to clone this repository in order to use DeGAs. However, when I import rivas_decomposition_py as decomposition, I get this error:

File "/home/gsd818/.conda/envs/default/lib/python3.7/site-packages/rivas_decomposition_py-0.0.19-py3.7.egg/rivas_decomposition_py/enrichr.py", line 18, in
ModuleNotFoundError: No module named 'enrichr_py'

Thanks

ModuleNotFoundError: No module named 'yt_misc_py'

import yt_misc_py as yt_misc in the notebooks yields an error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-21-0702ff55796a> in <module>
----> 1 import yt_misc_py as yt_misc
      2 
      3 import rivas_decomposition_py as decomposition

ModuleNotFoundError: No module named 'yt_misc_py'

incomplete summary_stats data

Dear Bioabankengine developers
Thanks very much for making summary_stats data accessible. I am a phd student major in venous diease. Recently I attempt to search some GWAS summary data on varicose ulcer for my phd project. This is not easy because venous disease is rarely noticed by researchers. Fortunately, I find biobankengine provide parts of the summary results of varicose ulcer(HC268) on the web but these data was not included in the downloaded "icds.sorted.tsv.gz" doucments. Would you mind sharing the summary data of HC268 to help me finish my project.
Thanks for your kindly help and please contact me if it is convinent for you. My email address is [email protected]
look forward to your reply !

Phenotype Code

Hi there, thanks for making the summary statistics available. Would you mind providing some instructions on how to translate Phenotype Code (e.g. BROADBIN1000011) to their definitions?

Thank you,
Zhihao

Questions about UK Biobank "summary_stats" data

I'm trying to extract phenotype associations for variants in selected genes (~300) from the precomputed UK Biobank GWAS results that you've kindly made available here:
https://github.com/rivas-lab/public-resources/tree/master/uk_biobank/summary_stats
(Thank you for this!)

I have the following questions that I hope you can answer:

In the "icdinfo.txt" file, there are different prefixes in the phenotype codes (first column) that seem to refer to different categories (e.g. "BIN", "INI", "HC", ...). Is there any documentation about these prefixes/categories? (I've seen the information in the FAQ here, but it doesn't explain everything.)
For some of these phenotype codes, the number at the end identifies the UK Biobank data field for the phenotype, but sometimes "10" or "100" is added in front of the number, and sometimes there's no obvious correspondence to a data field. Again, where can I find more information about the system that was used there?
In the same file, what is the meaning of the recurring numbers in columns 2, 4 and 5?
Many of the phenotypes listed in "icdinfo.txt" (or visible through the Global Biobank Engine website) are not represented in any of the "icds.sorted.part[n].tsv.gz" files. In particular, phenotypes with codes starting in "BIN", "BIN_FC", "MED" and "QT_FC" are missing. Are the "icds..." files from an older version of the analysis? Could they be updated?

Thanks in advance for your help.