Comments (4)
Hey Wenjun,
I don't believe Bioconductor versions of dbSNP 156 have been created yet - @hpages may have more information but I know it took time to create dbSNP 155 so I'm not sure of the timeline for this. Sorry Hervé, do you have any thoughts on this?
Thanks,
Alan.
from mungesumstats.
Hi Alan, Wenjun,
It might take a while before I get to produce the SNPlocs.Hsapiens.dbSNP156.*
packages. The approach I'm currently using for generating the SNPlocs packages has reached its limits and doesn't scale well with the ever increasing size of dbSNP. So it would need to be revisited e.g. by splitting the whole thing into smaller packages or by moving the data to AnnotationHub or both. It might take a while before I get to this.
In the mean time, if you really need SNPlocs.Hsapiens.dbSNP156.GRCh37 now, you can try to forge it by using the scripts provided in the SNPlocsForge package here. The package lacks documentation, sorry. The scripts for dbSNP156 are in inst/scripts/dbSNP156/
. You first need to manually create the shell of the SNPlocs.Hsapiens.dbSNP156.GRCh37 package (use the SNPlocs.Hsapiens.dbSNP155.GRCh37 package as a template). Then run the following scripts in that order: download_json.sh
, extract_snvs_from_RefSNP_json_files.sh
, select_GRCh37_snvs.sh
, build_GRCh37_OnDiskLongTable.sh
.
Note that you'll need a powerful Linux machine to run these scripts (I used a machine with 80 logical cpus and 384 Gb of RAM to forge the SNPlocs.Hsapiens.dbSNP155.*
packages, and the scripts took about 1 week for each package). You'll also need a lot of disk space (300 or 400 Gb or something like that).
Let me know if you decide to give it a try and I'll do my best to help.
Best,
H.
from mungesumstats.
Hey Herve,
Thanks very much for the explanation, this is not something I have time/resources to do right now but I do believe it's important to find a more manageable way to produces these packages with subsequent releases. I'll get in touch with any suggestions on how to do this in the future.
Cheers,
Alan.
from mungesumstats.
Let's leave this open for now since it has not been addressed in any meaningful way
from mungesumstats.
Related Issues (20)
- sumstatsColHeaders duplicates HOT 2
- format_sumstats p value column selection problem HOT 2
- `NA` values lead to `rbindlist` error when `imputation_ind = TRUE` HOT 1
- save_format error HOT 2
- Effect allele A1 or A2? HOT 3
- `format_sumstats()` fails to infer genome build HOT 2
- [E::hts_idx_push] Unsorted positions on sequence #1 HOT 2
- `read_sumstats()` detects a vcf file if "vcf" is anywhere in the file path except the beginning.
- Crash due to missing IMPUTATION_SNP column HOT 2
- Option to specify effect allele as A1 instead. HOT 13
- format_sumstats() does not check column type HOT 3
- Crash when imputation_ind = TRUE and chr:bp identifiers are present HOT 1
- Error in `[.data.table`(sumstats_dt, rsids, `:=`(SNP, i.RefSNP_id)) : 当删除列时,不应指定 i HOT 8
- Join results in more than 2^31 rows for format_sumstats HOT 3
- Allele frequency cant be flipped for multi-allelic variants. HOT 4
- Finding manual of previous versions HOT 4
- Fix GitHub Action HOT 8
- dbSNP154 HOT 1
- How to load local chain files HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mungesumstats.