Comments (2)
Thank you for fixing this bug so quickly! The updated version now works correctly on my example GWAS file.
As for rearranging the QC checks, I can see why doing so would be more complicated than I initially imagined. I'll let you know if I run into any other GWAS formats that break get_genome_build()
.
from mungesumstats.
Hey! Thanks for the detailed report. Some things to note:
Moreover, I think that get_genome_build() should fail if no matches are found, rather than defaulting to GRCh38 (which is the current behavior).
I agree, I have added this fix - if no match is found in either genome build MSS will now fail.
incorrectly concludes the reference genome for a GWAS summary statistics file that has chromosome numbers coded with the "chr" prefix
I have added a check for this in the check genome build function so it should work as expected now but let me know if it doesn't?
Instead, I suggest rearranging the QC steps in format_sumstats() to put all QC steps that don't rely on the GWAS assembly before the call to get_genome_build()
I had a look through all our 60+ checks and the vast majority require a genome reference. Moreover, these checks are interdependent so even if a check itself doesn't require a genome reference, a check it relies on might. This gets quite complicated so I have avoided moving any checks for SNP, CHR and BP to above the inference of genome build for now. However, do let me know if you come across any other examples other than the 'CHR' prefix which causes an issue with inferring the genome build and I'll be happy to update.
These changes have been added to MSS v1.9.9. You can install this from Github now or wait a few days for it to go up on the devel version of Bioconductor. Let me know if your issue persists (feel free to reopen this issue if so).
Cheers,
Alan.
from mungesumstats.
Related Issues (20)
- format_sumstats() does not check column type HOT 3
- Crash when imputation_ind = TRUE and chr:bp identifiers are present HOT 1
- Error in `[.data.table`(sumstats_dt, rsids, `:=`(SNP, i.RefSNP_id)) : 当删除列时,不应指定 i HOT 8
- Join results in more than 2^31 rows for format_sumstats HOT 3
- Allele frequency cant be flipped for multi-allelic variants. HOT 4
- SNPlocs.Hsapiens.dbSNP156.GRCh37 HOT 4
- Finding manual of previous versions HOT 4
- Fix GitHub Action HOT 8
- dbSNP154 HOT 1
- How to load local chain files HOT 6
- Error in `check_no_rs_snp()` when joining with duplicate key values HOT 3
- Error in `sort_coords()` due to `NA` in columns `CHR` & `BP` HOT 3
- Requires BRug which is only working on R 4.2 or older HOT 5
- Check Position by rsID rather than updating the rsID? HOT 2
- All SNPs have been filtered out of your summary statistics dataset HOT 6
- `format_sumstats()` removes rows with unneeded columns that contain `NA` values HOT 2
- Add local chain file option for MungeSumstats::liftover # enhancement HOT 1
- liftover not working for certain region HOT 3
- `format_sumstats()` duplicates base letter between `A1` and `A2` HOT 1
- ValueError: Improperly formatted sumstats file: HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mungesumstats.