belowlab / immerge Goto Github PK
View Code? Open in Web Editor NEWTool to merge VCF genotype files at scale
License: MIT License
Tool to merge VCF genotype files at scale
License: MIT License
This follows up my post about the updated TOPMed. In addition to the changes in the format of the info files, the TOPMed dosage files contain duplicate SNPs that are mishandled by IMMerge.
Our attempt to merge VCF files from TOPMed crashed with the following error: reached end of file...but SNP chr9:205964:G:A is not found. But in fact, chr9:205964:G:A and chr9:205964:A:G are both in the info files and dosage files. Here is the order of the SNPs in the info and dosage files:
23519 chr9 205964 9:205964 G A
23520 chr9 205964 rs478882 A G
But in the variants retained file, the order of the SNPs is reversed.
SNP REF.0. ALT.1.
175 chr9:205964:A:G A G
176 chr9:205964:G:A G A
The order of the SNPs is also reversed in the index file.
I believe this occurred because the SNPs are sorted by Position and the SNP when creating the retained and excluded lists. Therefore, when IMMerge walked down the retained SNP list, it found the A:G version on line 23520 of the dosage file. It then started searching for the next SNP in the retained list, the G:A version, on line 23251 of the dosage file and searched to the bottom of the file; of course, it missed the SNP since it was on the line above where the search started.
Do you have any suggestions for a quick fix of this problem? IMMerge has been very useful to us despite this glitch. We would like to continue to use it with the new TOPMed files.
Hello,
I was wondering if there would be a way to add a flag allowing you to keep the old R2 values stored in the output VCF as another column for each variant. Perhaps this could be a flag that defaults to off but can be turned on when running the merge.
AttributeError: 'DataFrame' object has no attribute 'isna'
open: No such file or directory
It appears that TOPMed has changed the format of the info files it produces. For example, rather than a single Genotyped field, there are now two fields, IMPUTED and TYPED. Do you plan to update the IMMerge to be compatible with the new info file format?
I am having an issue where the final merged output VCF file does not have the proper flag for whether the SNP was IMPUTED or TYPED. They are all correctly listed in the input info file and the individual input VCFs but then they all say IMPUTED for the final output VCF. I have tried this with the '--mixed_genotype_status' turned off (to output IMPUTED/TYPED based on the first input VCF) and with it turned on (to output ALL/SOME/NONE). Either way IMMERGE outputs that all SNPs were IMPUTED or NONE.
Hi there,
Has the optional command '--mixed_genotype_status' been disabled ? I tried to include the command '--mixed_genotype_status true' and got the error message 'merge_files.py: error: unrecognized arguments: --mixed_genotype_status true'.
I am using IMMerge version 0.0.3.
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.