Code Monkey home page Code Monkey logo

Comments (10)

zhanxw avatar zhanxw commented on May 27, 2024

from rvtests.

zishang30 avatar zishang30 commented on May 27, 2024

Hi, Xiaowei
Thank you very much for your reply!
How can we validate our VCF file?

from rvtests.

zhanxw avatar zhanxw commented on May 27, 2024

You can write a script to check the number of column for each line in your VCF file. If you see inconsistency, you will need to manually fix it.

You may also want to use a script to check VCF file: https://github.com/zhanxw/checkVCF

from rvtests.

zishang30 avatar zishang30 commented on May 27, 2024

Dear xiaowei

Thank you very much for your help!
I have already download the check VCF and start to run it.
If there is an inconsistency, how can I manually fix it ?
Thanks!

from rvtests.

zhanxw avatar zhanxw commented on May 27, 2024

In my point of view, to manually fix a VCF file is not a good practice.
You probably need to check previous analysis steps and find out which step bring in the inconsistency. Then you can fix that step.

from rvtests.

zishang30 avatar zishang30 commented on May 27, 2024

Dear Xiaowei

I already check chr22 using checkVCF tools. Here is the report:
--------------- REPORT ---------------
Total [ 252148 ] lines processed
Examine [ 10 ] VCF header lines, [ 252138 ] variant sites, [ 3788 ] samples
[ 0 ] duplicated sites
[ 0 ] NonSNP site are outputted to [ test.check.nonSnp ]
[ 0 ] Inconsistent reference sites are outputted to [ test.check.ref ]
[ 0 ] Variant sites with invalid genotypes are outputted to [ test.check.geno ]
[ 19110 ] Alternative allele frequency > 0.5 sites are outputted to [ test.check.af ]
[ 0 ] Monomorphic sites are outputted to [ test.check.mono ]
--------------- ACTION ITEM ---------------
It seems there is no inconsistency at this VCF file. But when I use this VCF to run RVtest with kinshipmatrix, we still get the same segmentation fault.Therefore I assume this problem due to the kinshipmatrix. And when I re-check this problem, I remember some important point: we have 2 dataset A and B to run rvtest. When I generate kinshipmatrix for dataset A, it report " Expected 4397 individual but only have 2242 individual Report 'VCF header have LESS people than VCF content!' And when we finished the kinship matrix generation, the log file report some warning: " Warning: Specified parameter --ped has no effect. "
The former command for kinshipmatrix are:
vcf2kinship --inVcf N3788_chr1.vcf.gz --ped phenotypes.ped --bn --minMAF 0.050000 --thread 8 --out kinship_matrix
we don't use --xHemi, because we have no Chr X imputation data. I check google and it seems this may cause this warning.
Do you think this is the reason for the inconsistency when generate the kinshipmatrix?
I already start to re-generate kinshipmatrix using the full command:
vcf2kinship --inVcf N3788_chr1.vcf.gz --ped phenotypes.ped --bn --xHemi --minMAF 0.050000 --thread 8 --out kinship_matrix

Thank you very much for your help!

Best regards

Zishan

from rvtests.

zishang30 avatar zishang30 commented on May 27, 2024

Dear Xiaowei

This is the new log file to generate kinship matrix.

###################################
[INFO] Program version: 20170210
[INFO] Git Version
[INFO] 584dea4
[INFO] Parameters BEGIN

ParameterList created by zishan.gao on 128055 at Mon Apr 24 17:14:09 2017

--inVcf "N3788_chrALL.vcf.gz" --out "hemi_kinship_matrix.kinship" --xHemi --ped "phenotype.ped" --bn --minMAF 0.050000 --thread 8
[INFO] Parameters END
[INFO] Analysis started at: Mon Apr 24 17:14:09 2017
[INFO] Multiple ( 8 ) threads will be used.
[INFO] Empiricial kinship will be calculated.
[INFO] Start creating empirical kinship from VCF file.
[INFO] Using default maximum missing rate = 0.05
[INFO] Exclude [ 6 ] samples from VCF files because they do not exist in pedigree file or do not have sex:
[INFO] Total [ 3782 ] individuals from VCF are used.
[INFO] Total [ 19921449 ] VCF records have been processed.
[INFO] Kinship [ hemi_kinship_matrix.kinship.kinship ] has been generated.
[ERROR] There are not enough variants to create kinship matrix.
[ERROR] Failed to create hemizygous-region kinship file [ S4F4hemi_kinship_matrix.kinship.xHemi.kinship ].
[INFO] Skipped [ 14591821 ] sites due to MAF or high misssingness
[INFO] Total [ 5329628 ] variants are used to calculate autosomal kinship matrix.
[INFO] Total [ 0 ] variants are used to calculate chromosome X kinship matrix.
[INFO] Analysis ends at: Wed Apr 26 04:53:26 2017
[INFO] Analysis took 128357 seconds.

###############################################

According to this log , it seems "There are not enough variants to create kinship matrix". And, we still have the inconsistency "VCF header have LESS people than VCF content!'" when we generate this new kinship matrix, although we have already used the --xHemi command at this case.

Thanks!

Best regards

Zishan

from rvtests.

zhanxw avatar zhanxw commented on May 27, 2024

Maybe you can skip "--xHemi" option in vcf2kinship program?
Then you will not create ".xHemi" kinship files, and that may help rvtests to run smoothly.
Can you please try again? Thanks.

from rvtests.

zishang30 avatar zishang30 commented on May 27, 2024

Dear Xiaowei
we try to run the vcf2kinship command with or with out "--xHemi" and got results as follows:
###########################
Run kinship without xHemi:
[INFO] Program version: 20170210
[INFO] Git Version
[INFO] 584dea4
[INFO] Parameters BEGIN

ParameterList created by zishan on 55 at Wed Apr 26 17:33:05 2017

--inVcf "chrall.vcf.gz" --out "0426_kinship_matrix" --ped "phenotype_all.ped" --bn --minMAF 0.050000 --thread 12
[INFO] Parameters END
[INFO] Analysis started at: Wed Apr 26 17:33:05 2017
[INFO] Multiple ( 12 ) threads will be used.
[INFO] Empiricial kinship will be calculated.
[WARN] Warning: Specified parameter --ped has no effect.
[INFO] Start creating empirical kinship from VCF file.
[INFO] Using default maximum missing rate = 0.05
[INFO] Exclude [ 6 ] samples from VCF files because they do not exist in pedigree file or do not have sex:
[INFO] Total [ 3782 ] individuals from VCF are used.
[INFO] Total [ 20023742 ] VCF records have been processed.
[INFO] Kinship [ 0426_kinship_matrix.kinship ] has been generated.
[INFO] Skipped [ 14665592 ] sites due to MAF or high misssingness
[INFO] Total [ 5358150 ] variants are used to calculate autosomal kinship matrix.
[INFO] Analysis ends at: Fri Apr 28 06:46:49 2017
[INFO] Analysis took 134024 seconds.
###################################
Run kinship with xHemi
[INFO] Program version: 20170210
[INFO] Git Version
[INFO] 584dea4
[INFO] Parameters BEGIN

ParameterList created by zishan on 55 at Wed Apr 26 23:23:02 2017

--inVcf "chrall.vcf.gz" --out "0427_kinship_matrix" --xHemi --ped "phenotypes_all.ped" --bn --minMAF 0.050000 --thread 12
[INFO] Parameters END
[INFO] Analysis started at: Wed Apr 26 23:23:02 2017
[INFO] Multiple ( 12 ) threads will be used.
[INFO] Empiricial kinship will be calculated.
[INFO] Start creating empirical kinship from VCF file.
[INFO] Using default maximum missing rate = 0.05
[INFO] Exclude [ 6 ] samples from VCF files because they do not exist in pedigree file or do not have sex:
[INFO] Total [ 3782 ] individuals from VCF are used.
[INFO] Total [ 20023742 ] VCF records have been processed.
[INFO] Kinship [ 0427_kinship_matrix.kinship ] has been generated.
[ERROR] There are not enough variants to create kinship matrix.
[ERROR] Failed to create hemizygous-region kinship file [ 0427_kinship_matrix.xHemi.kinship ].
[INFO] Skipped [ 14665592 ] sites due to MAF or high misssingness
[INFO] Total [ 5358150 ] variants are used to calculate autosomal kinship matrix.
[INFO] Total [ 0 ] variants are used to calculate chromosome X kinship matrix.
[INFO] Analysis ends at: Fri Apr 28 11:53:45 2017
[INFO] Analysis took 131443 seconds.
##################################
To compare this 2 log file ,we can see that it seems we can generate kinshipmatrix without xHemi but get the warning : Specified parameter --ped has no effect.
When we generate kinship matrix with xHemi, it seems we got the error "There are not enough variants to create kinship matrix." and "Failed to create hemizygous-region kinship file [ S4F40427_kinship_ mari x.xHemi.kinship ].
ThΓ­s 2 generate kinship are in the same size 158M

I assume that both this warning and error are due to lack of X chromosome. And at this situation just generate kinshipmatrix without --xHemi maybe better because we can generate kinshipmartix with some warning instead of the error happened in the kinshipmatrix with xHemi.

Also, this time we haven't receive the error "'VCF header have LESS people than VCF content", because I use unsorted and untabixed vcf.gz to generate kinshipmatrix.
Therefore may be we can not use sorted and tabixed vcf.gz to make kinshipmatrix because maybe this kind of sort and tabix will change the VCF file content or the header?

In addition, we have got the x chromosome imputation data at yesterday, I will try to generate the ki nship matrix with X chromosome at this weekend.

from rvtests.

zhanxw avatar zhanxw commented on May 27, 2024

Thanks for reporting back here.
It seems your problem was solved, right?
It's usually unlikely that sort and tabix will change the VCF file content...

from rvtests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.