wglab / penncnv Goto Github PK

Copy number vaiation detection from SNP arrays

Home Page: http://penncnv.openbioinformatics.org

License: Other

Perl 45.66% Makefile 0.05% C 53.79% Python 0.25% SWIG 0.25%

penncnv's Introduction

Introduction to the PennCNV software

PennCNV is a free software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.

Any edit to this repository will be reflected at http://penncnv.openbioinformatics.org instantly.

If you like this repository, please click on the "Star" button on top of this page, to show appreciation to the repository maintainer. If you want to receive notifications on changes to this repository, please click the "Watch" button on top of this page.

Reference

Wang K, Li M, Hadley D, Liu R, Glessner J, Grant S, Hakonarson H, Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data Genome Research 17:1665-1674, 2007
Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms Nucleic Acids Research 36:e126, 2008
Wang K, Chen Z, Tadesse MG, Glessner J, Grant SFA, Hakonarson H, Bucan M, Li M. Modeling genetic inheritance of copy number variations Nucleic Acids Research 36:e138, 2008

penncnv's People

Contributors

Stargazers

Watchers

Forkers

krissamac hugofilho merrimanlab ashang tpoterba marsvetlana mmesbahu biocodings zhigangmeng joeglessner tianyunwang vbruat heliac2000 caot jessicagiordano mxdeluca lqx8090 rimzaag wangqflab vallurumk xabiercs dowing mcconlogue vyellapa victor0122 xiaohuaniu0032 darked89 fkdatm humaasif yibenqiu an-lewis thunguyen2001 hui-1 mimorik shunsunsun nvk747 jiangstte wook2014 rezajf ranxu1995 yang-mj weiakanedeng oshintogla chenjunonly ntnguyen13 trijayani wjr010626 wyzhou22 aburayaki fatcat-del zuyan413 r-jadhav97 michaelofrancis xingchaowu

penncnv's Issues

No chromosome Y data for genocluster and LRR/BAF- Affy Axiom data

Hi there

I am generating genocluster followed by LogRR/BAF on Axiom PMDA data. It working for all chromosomes but not for Y and we are particularly interested in Chr Y.
The pfb file has annotation for all chr Y probes ?

any idea why this is happening?

Generate genotyping calls from CEL files

Hello there,

Thanks for the nice tutorial!

I have calvin CEL files that were converted to text files using apt-cel-convert.
Having the text CEL files, I tried to run the command and got an error:
apt-probeset-genotype -c lib/GenomeWideSNP_6.cdf -a birdseed --read-models-birdseed lib/GenomeWideSNP_6.birdseed.models --special-snps lib/GenomeWideSNP_6.specialSNPs --out-dir apt --cel-files listfile

FATAL ERROR:TsvFile.cpp:2672: This file appears to be binary. (filename='../data/test.cel/1.CEL')

It's clear that the CEL files are not binary.
It seems the error is from birdseed, but I was wondering whether you have any thoughts on this.

Thanks,
Segun

Unable to locate HMM File for Axiom UKB Array

Hello,

I am running PennCNV on about 9K samples and noticed that HMM file is not available for Axiom UKB Array.
Would it be possible to use "affygw6.hmm" instead ? or the model file has to be generated by "train".

Best,
Nick

PennCNV [clean_cnv.pl] WARNING: 3342 lines were skipped due to unrecognizable formats

Hi, I can't seem to make the clean_cnv.pl to work in PennCNV tool. The command line used is here:
./clean_cnv.pl combineseg --verbose --fraction 0.2 --signalfile ./lib/CCS_cleaned_10042020.pfb CCS_filtered.annotated3.cnv --out CCS_gapmerged.cnv

The error message is here:

NOTICE: Total of 659184 records are read from ./lib/CCS_cleaned_10042020.pfb
WARNING: 3342 lines were skipped due to unrecognizable formats

The cnvfile (space-delimited) format is shown as examples below:

chr1:110228436-110232335 numsnp=23 length=3900 state=2,cn=1 sample.split1 startsnp=1:110228436_CNV_GSTM1 endsnp=1:110232335_CNV_GSTM1 chr4:69403991-69404631 numsnp=7 length=641 state=2,cn=1 sample.split1 startsnp=4:69403991_CNV_UGT2B17 endsnp=4:69404631 chr6:29782970-29783064 numsnp=14 length=95 state=2,cn=1 sample.split1 startsnp=6:29782970 endsnp=rs1077433

Please kindly help me as I have tried to combine the calls for the whole day to no avail.

P-values to de novo CNV calls

Hi,
After identifying the new CNVs by Joint I did the assigning P-values to new CNV calls (validating predicted new CNVs)

But some results (p-value) are strange, they are greater than 1. My code and result obtained (pvalor of 1.5) follows below. I hope you can help me.

infer_snp_allele.pl -pfb allsamples.pfb -hmm Test.hmm -denovocn 3 FATHER.txt MOTHER.txt     SON.txt -start SNPX1 -end SNPX2 -out tempfile

...
[7] "NOTICE: For the region chr1:38730376-38750794, 12 markers were identified from FATHER.txt"
[8] "NOTICE: For the region chr1:38730376-38750794, 12 markers were identified from MOTHER.txt"
[9] "NOTICE: For the region chr1:38730376-38750794, 12 markers were identified from SON.txt"
[10] "NOTICE: Analyzing trio FATHER.txt MOTHER.txt SON.txt"
[11] "NOTICE: Evidence for parental origin for the putative de novo CNVs (de novo CN=3 in trio FATHER.txt MOTHER.txt SON.txt ): Marker= 12 Paternal_origin(F)= 1 Maternal_origin(M)= 1 P-value= 1.5"

temfile
Name LRR_F LRR_M LRR_O BAF_F BAF_M BAF_O GENO_F GENO_M GENO_O
1 AX-75442315 1.22384240 -0.40021763 0.83387941 0.10771760 0.9991463 0.28033960 AA BB AAB
2 AX-80739365 -0.05270831 -0.17584350 0.47600501 0.00000000 0.0000000 0.01227067 AA AA AAA
3 AX-75442328 -0.37651942 -0.03768277 0.28612970 1.00000000 0.5252890 0.35981647 BB AB AAB
4 AX-80782291 -0.01382187 -0.06273745 0.38581987 0.01677153 0.4654819 0.00000000 AA AB AAA
5 AX-75442339 0.13877110 0.14623184 0.62283328 1.00000000 0.9839616 0.98646505 BB BB BBB
6 AX-80822384 -0.11190325 -0.45343429 -0.08698980 1.00000000 0.1148214 0.25401103 BB AA AAB
7 AX-75442344 -0.08249386 -0.14573546 0.07661979 1.00000000 0.5163739 0.71945969 BB AB ABB
8 AX-75442345 -0.22875525 -0.19777528 0.28132614 0.95536751 1.0000000 1.00000000 BB BB BBB
9 AX-75442350 0.01422247 -0.08666986 0.17382427 0.00000000 1.0000000 0.34468498 AA BB AAB
10 AX-75442353 -0.02782679 -0.12583260 0.09268115 0.97720128 0.9691261 0.95098485 BB BB BBB
11 AX-80762445 0.04548795 -0.51386653 0.54591703 0.99233125 0.1648574 0.45089455 BB NC AAB
12 AX-75442357 0.06765128 0.07259621 0.41891134 0.99409324 1.0000000 0.96811810 BB BB BBB
Origin
1 ?
2 ?
3 M
4 ?
5 ?
6 ?
7 F
8 ?
9 ?
10 ?
11 ?
12 ?

I look forward to hearing from you.

Suggestion for running compile_pfb.pl with 8K samples

Hello Dr. Wang,

I have about 8K intensity files and it takes quite a long for generating PFB files with all 8K samples.
I am planning to try two different approaches to generate PFB files faster

Split SNP position files into 100 parts and run in parallel using argument '-snpposfile'
Run with small number (2K or 4K) of intensity files

Does either of the above will significantly effect results or have any obvious downside that I am not considering ?

Best,
Nick

How to get a gcmodel without the file gc5Base.txt.gz

Dear Dr. Wang,

Thank you very much for your excellent software.I have a question that needs your help.

I use the Porcine SNP50 Beadchip from Illumina (50,703 SNP), and now I need gcmodel to adjust the signal.
I know that the pig's gcmodel can be obtained with the script cal_gc_snp.pl and requires a special file gcfile.
Gcfile can be downloaded from the UCSC website, but the file gc5Base.txt.gz (10 columns) of the latest version of Sus_scrofa 11.1 has not been found on this website. The similar file is only gc5BaseBw.txt.gz (ftp://hgdownload.soe.ucsc.edu/goldenPath/susScr11/database/gc5BaseBw.txt.gz) and the size is only 67 B.
I also found the file susScr11.gc5Base.wig.gz (ftp://hgdownload.soe.ucsc.edu/goldenPath/susScr11/bigZips/susScr11.gc5Base.wig.gz) on bigZips, which is very similar to gc5Base.txt.gz, with a size of 9Mb, but with only 9 columns.

I want to know if the file susScr11.gc5Base.wig can be converted to gc5BaseBw.txt? Or can the file susScr11.gc5Base.wig be used to get gcmodel?

Thank you so much!
Rongrong Ding.

CNV calling on Illumina Omni 2.5M array

Thanks in advance for your time. I was just confirming if this would be the correct way to use detect_cnv.pl on illumina Omni data:

./detect_cnv.pl -test -hmm lib/hh550.hmm -pfb [ generated pfb file from annotation /signal files ] [ signal files] -log ../sample.log -out sampleall.rawcnv

Best,
Tom

the output file generated from runex.pl 1 is empty

Hi,

I was trying to run detect_cnv.pl, however, the generated output file was always empty, even when I tried to use the example provided in the guide.

Hope you can help me with this.

Thanks!

compile_pfb.pl not working

I am trying to compile my own pfb file from my signal files.
Here is the command I'm passing it:
compile_pfb.pl -listfile /gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/signal_file_list_02.txt -output /gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/pfb_files/pop_b_freq.pfb

The list file looks as follows:
/gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/4770-MA-1651_202566700099_R01C01_formatted_02.txt
/gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/4770-MA-1635_202524980055_R06C01_formatted_02.txt
/gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/4770-MA-1659_202566700099_R08C01_formatted_02.txt

The files it is trying to read look as follows:
SNP Name Chr Pos BFreq LogR
1:100292476 1 100292476 0.0 -1.02807
1:101064936 1 101064936 0.0728731 0.0350392
1:103380393 1 103380393 1.0 0.0676645

The following are the outputs/errors that are produced when I attempt to run the command:
NOTICE: A total of 1686 input signal files is specified in /gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/signal_file_list_02.txt
WARNING: Skipping the file /gpfs/gpfs2/home/lsheffield/AGHI/formatted_reports/4770-MA-0212_201131650106_R08C01_formatted_02.txt that cannot be read by the current program
Use of uninitialized value $baf_index in addition (+) at /gpfs/gpfs2/software/PennCNV-1.0.4/compile_pfb.pl line 99.
NOTICE: The B Allele Freq information is annotated as column 1 in input files
NOTICE: A total of 0 input files will be used for compiling PFB values
Use of uninitialized value $snp in pattern match (m//) at /gpfs/gpfs2/software/PennCNV-1.0.4/compile_pfb.pl line 128.
Use of uninitialized value $snp in hash element at /gpfs/gpfs2/software/PennCNV-1.0.4/compile_pfb.pl line 140.

PennCNV-Affy for Genome-wide 6.0

Hello,
I am trying to call CNVs from Genome-wide 6.0 array and it has data of 906,600 SNPs and 946,000 CN probes. I wonder what is the calling based on, SNPs or CN probes? In .rawcnv, startsnp and endsnp contain both SNPs and CN probes. Can PennCNV-Affy package call CNVs just based on CN probes?

And, I found the BAF value of CN probes are all 2 in individual signal file, could this affect the calling CNVs? The following is a example of individual signal file :

CN_496308 1 836746 0.3908 2
CN_522419 1 818586 -0.1996 2
CN_489691 1 824136 0.3437 2

Thanks in advance!

hh550.hg18.pfb is missing

Hi there,

I just tried to run the example, but i got an error message

ERROR: cannot read from pfb file ../lib/hh550.hg18.pfb: No such file or directory

Then i checked the ../lib folder, there are only two files: hh550.hmm and hhall.hmm but no .pfb files.

Could you please help me to figure out how to solve this problem?

Thanks!

"detect_cnv.pl -validate" shows probability instead of state

$ runexpl --path_detect_cnv=../detect_cnv.pl 5

ex5.rawcnv:
chr3:3957986-4054960          numsnp=50     length=96,975      state86.4199950337373,cn=84.4199950337373 father.txt startsnp=rs11716390 endsnp=rs17039742
chr20:10511631-10583260       numsnp=10     length=71,630      state11.5231778900509,cn=9.5231778900509 father.txt startsnp=rs8114269 endsnp=rs682562

state and cn have strange values...

#assign the score for each region (similar to assignConfidence subroutine)
sub validateRegion {
                       :
  for my $nextstate (1, 2, 3, 5, 6) {   #do not consider LOH
                       :
    push @logprob, [$logprob + log ($prior_prob->[$stateindex]), $nextstate];
                       : 
  }
                       :
  @logprob = sort {$b->[0]<=>$a->[0]} @logprob;
  my $beststate = $logprob[0]->[0];

$logprob[0]->[0](probability) is assgined to $beststate, but a state value($logprob[0]->[1]) should be assigned.

  my $beststate = $logprob[0]->[1];

I try to rerun, outputs changed as follows.

chr20:10511631-10583260       numsnp=10     length=71,630      state2,cn=1 father.txt startsnp=rs8114269 endsnp=rs682562
chr3:3957986-4054960          numsnp=50     length=96,975      state2,cn=1 offspring.txt startsnp=rs11716390 endsnp=rs17039742

detect_cnv.pl resulting in "Segmentation fault: 11"

Just installed PennCNV 1.0.3 on my system (OSX 10.11.5, perl 5.14.2). Functions can be called without a problem, but when calling detect_cnv.pl, I receive a "Segmentation fault: 11" error. I heard that it might be a memory problem, but there seem to be many possible causes.

I installed perl 5.14.2 into a custom folder (only 1 test out of ~2000 failed) and linked it to my PATH. Then I made the suggested changes to Makefile, compiled and also added the PennCNV folder to PATH.

Any idea what this could be?

In a virtual machine running Linux Mint and perl 5.14.2, I had a similar problem. I could call detect_cnv.pl, but after:
WARNING: Sample from /home/VM/Downloads/gw6.1_tumor does not pass default quality control criteria due to its large SD for LRR (0.473134003738753)!
WARNING: Sample from /home/VM/Downloads/gw6.2_tumor does not pass default quality control criteria due to its drifting BAF values (drift=0.0067873951858485)!
WARNING: Sample from /home/VM/Downloads/gw6.3_tumor does not pass default quality control criteria due to its waviness factor values (wf=-0.0841)!
WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution!
Segmentation fault

Error when trying to compile

Compiling under perl 5.22 gives me this error in from handy.h
/usr/lib/x86_64-linux-gnu/perl/5.22/CORE/handy.h:113:34: error: ‘bool’ undeclared (first use in this function)
/usr/lib/x86_64-linux-gnu/perl/5.22/CORE/handy.h:113:39: error: expected ‘:’ before numeric constant

I have added
#include <stdio.h>
#include <stdbool.h>
but the same errors persist.

Run example issue

I am trying to get PennCNV working on my linux computing server. Here's what I've done, per this document: http://penncnv.openbioinformatics.org/en/latest/user-guide/install/

-Downloaded + decompressed
-loaded perl (5.14.2) & R/2.15.2 modules (global modules)
-went to kext folder and used make
-went to penncnv folder and typed "perl ./detect_cnv.pl". The proper usage information appears
-added to my export PATH = statement in my bashrc file, "$HOME/opt/penncnv" where the files are located

I then navigated to the example folder to try to run those, using this document as a guide: http://penncnv.openbioinformatics.org/en/latest/user-guide/startup/

-Ran "perl runex.pl", got proper usage information
-Ran "perl runex.pl 1", and got Error: "Can't exec detect_cnv.pl: (null) at example/runex.pl line 81"
-Ran "perl runex.pl -path_detect_cnv ../detect_cnv.pl", same error occurs.
-Also tried going up to the main penncnv folder and using "perl example/runex.pl 1 -path_detect_cnv detect_cnv.pl", but get the same error.

Not sure where to start diagnosing the issue here. I have already contacted support for my server, but haven't heard back yet. I'm hoping this is something I can fix easily. Thank you in advance for any help.
Gaius

Axiom binary to text CELS

Hi there, I am having a hard time converting Axiom raw CEL files to text CELs to use it in PennCNV-Affy protocol. APT fails to do the trick as Axiom files are multi-channel. Since some users here appear to have successfully used Axiom data in PennCNV I feel like I am missing something. Please advise. Thank you for your time,

can not install PennCNV in Ubuntu 18.04.1 LTS (Bionic Beaver)

I try to install PennCNV to ubuntu 18.04. I did the steps and still get the errors message:

PennCNV compilation error: Your system architecture is 'x86_64-linux-gnu-thread-multi', which is not compatible with pre-compiled executables in PennCNV package.
PennCNV compilation error: Please download source code from http://www.openbioinformatics.org/penncnv and compile executable program.

PennCNV error

Hello, i first time use penncnv, I have some problems, no matter what version of perl installing, the detect_cnv.pl can't work. error information as following:
PennCNV error: Can't load './kext/5.8.8/x86_64-linux/khmm.so' for module khmm: ./kext/5.8.8/x86_64-linux/khmm.so: undefined symbol: PL_thr_key at /home/my/perl5/perlbrew/perls/perl-5.8.8-PIC/lib/5.8.8/x86_64-linux/DynaLoader.pm line 230.
at kext/khmm.pm line 11
Compilation failed in require at detect_cnv.pl line 10.
PennCNV compilation error: Please download source code from http://www.openbioinformatics.org/penncnv/ and compile executable program.
how can i do?
thanks and hope for your reply! sincerely

PennCNV affy X chromosome missing

Hi There,
I was following PennCNV affy tutorial, after "Step 2: Split the signal file into individual files for CNV calling by PennCNV", I did "wc -l file.split1" to check the number of lines and found all the individual files had 1,401,380 lines instead of around 1.8 million lines for Affy 6.

I then used "tail file.split1" and found the last line was
CN_922408 22 49578524 -0.0502 2
and I did not see X chromosome.

I used "wc -l gw6.lrr_baf.txt" and found there were only 1,401,380 lines too.
What could I have done wrong that lost about 400,000 probsets for each sample?

[clean_cnv.pl] Error: the index for SNPs (...) are not found from signalfile

Hi,

I keep getting this error when running the clean_cnv.pl script to merge large CNVs.
Error: the index for SNPs (kgp21634490 and kgp12494942) are not found from signalfile

I double checked my signal file and found those SNPs do exist so I'm not quite sure how to interpret this warning.

Thanks!

GCmodel not in /lib

Hi,

I would like to run the detect_cnv.pl script and adjust by GCmodel, as in the example 3

$ perl runex.pl  3
Exercise 3: individual-based calling with GCmodel signal adjustment, write to ex3.rawcnv
	Running command <detect_cnv.pl -test -hmm example.hmm -pfb example.pfb -log ex3.log -out ex3.rawcnv -gcmodel ../lib/hh550.hg18.gcmodel -conf -list inputlist>

In the FAQ http://penncnv.openbioinformatics.org/en/latest/misc/faq/ it is written that

gcmodel files have been provided in the lib/ folder.

However, this is not the case. Could you clarify where I can find these GCmodel files for Illumina arrays. I found the ones for Affymetrix arrays at:
http://www.openbioinformatics.org/penncnv/download/gw6.tar.gz

Thanks,
Yann

PFB file for Illumina GSA array

Hi Dr Wang,
Thanks for your time. I am generating a PFB file for Illumina GSA array from my signal files. Here's the command:

compile_pfb.pl -listfile sampleID.txt -output try.pfb

The files look as follows:

Name Chr Position GType B Allele Freq Log R Ratio
1:100292476 1 100292476 AA 0.00 0.1172143
1:101064936 1 101064936 AA 0.01691524 0.06694419

The following are the outputs that are produced when run the command:

NOTICE: A total of 2267 input signal files is specified in sampleID.txt
NOTICE: File handle cannot be created by the operating system after reading 1021 files
NOTICE: The B Allele Freq information is annotated as column 5 in input files
NOTICE: A total of 1021 input files will be used for compiling PFB values

Are there too much input files? I have thousands of signal files. How to handle this?
BTW, Do i just need control's signal files or pool cases and controls together when i compile my own PFB file?

Thank you!

WF gc model

Dear Dr Wang,

I am working with an Infinium Oncoarray-500k dataset (hg19 reference).
I was considering applying a correction on the waviness factor, as my WF values exceed 0.04.
Do you know if there are gc models available which might be suitable for this array type?

Kind regards,
Mailie

hmm file for chicken custom designed array

Hi Dr. Wang,

I want to use PennCNV to call CNV from a custom designed chicken Axiom array (96 CEL files). I followed the supplementary file from the study of "Cognitive Performance Among Carriers of Pathogenic Copy Number Variants: Analysis of 152,000 UK Biobank Subjects". My problem is that in the CNV calling step, I do not HMM file. Would you please give me suggestion to create a HMM file?

Best,
Hongen

PFB missing markers

Hi,
I have generated a PFB file for Illumina GSA from 1000 sample.
Illumina GSA has approximately 600 000 markers. When the PFB is compiled, it only has 123 000.
Then when I call detect_cnv I get this in the log:
NOTICE: Done with 122965 records in 24 chromosomes (495575 records are discarded due to lack of PFB information for the markers)

I wonder why I don't get the entire set of markers in the PFB. Is there something missing in my data or is this expected behavior? Thanks

Error: Found probeset A allele but not B allele in generate_affy_geno_cluster.pl

Hello,

I am running an issue in generate_affy_geno_cluster.pl using latest PennCNV to perform Affymetrix based analysis:

/usr/bin/perl /usr/local/bin/generate_affy_geno_cluster.pl \
	ax_output/AxiomGT1.calls.txt \
	ax_output/AxiomGT1.confidences.txt \
	ax_output/quant-norm.pm-only.med-polish.expr.summary.txt \
	-locfile penncnv_output/Run1_AxiomGT1.cleaned.pfb \
	-sexfile clean_gender_file.txt \
	-out ax_output/AxiomGT1.genocluster

This is the command output:

NOTICE: The --confidence_threshold argument is automatically set as 0.01 (default for Affymetrix Power Tools calling)
NOTICE: Reading marker-location-file penncnv_output/Run1_AxiomGT1.cleaned.pfb ... Done with 19981 markers!
NOTICE: A total of 119 males and 71 females are found in the signal file ax_output/quant-norm.pm-only.med-polish.expr.summary.txt
Use of uninitialized value $_ in substitution (s///) at /usr/local/bin/generate_affy_geno_cluster.pl line 138, line 1303830.
Use of uninitialized value $_ in split at /usr/local/bin/generate_affy_geno_cluster.pl line 139, line 1303830.
Use of uninitialized value $sig_psid in string eq at /usr/local/bin/generate_affy_geno_cluster.pl line 141, line 1303830.
Use of uninitialized value $sig_psid in concatenation (.) or string at /usr/local/bin/generate_affy_geno_cluster.pl line 141, line 1303830.
Use of uninitialized value $_ in concatenation (.) or string at /usr/local/bin/generate_affy_geno_cluster.pl line 141, line 1303830.
Error: Found probeset A allele but not B allele in signal file ax_output/quant-norm.pm-only.med-polish.expr.summary.txt line 1303355 (genofile line 1) for marker AX-18000609: <> at /usr/local/bin/generate_affy_geno_cluster.pl line 141
main::generateAffyGenoCluster('ax_output/AxiomGT1.calls.txt', 'ax_output/AxiomGT1.confidences.txt', 'ax_output/quant-norm.pm-only.med-polish.expr.summary.txt', 'HASH(0xc59738)', 'HASH(0xc59270)') called at /usr/local/bin/generate_affy_geno_cluster.pl line 41
done.

I don't know under which circumstances this behavior is expected.
Any idea what's happening?

This is my perl environment

$ perl -V
Summary of my perl5 (revision 5 version 14 subversion 2) configuration:

Platform:
osname=linux, osvers=4.9.0-2-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux tatooine 4.9.0-2-amd64 #1 smp debian 4.9.13-1 (2017-02-27) x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security
-Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib
/perl/5.14 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.14.2 -Dsitearch=/usr/l
ocal/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dusesitecustomize
-Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrp
lib -Dlibperl=libperl.so.5.14.2 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='4.7.2', gccosandvers='' [125/162]
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=, so=so, useshrplib=true, libperl=libperl.so.5.14.2
gnulibc_version='2.13'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT
USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_PERL_ATOF
USE_REENTRANT_API USE_SITECUSTOMIZE [103/162]
Locally applied patches:
DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @inc directories.
DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
DEBPKG:fixes/respect_umask - Respect umask during installation
DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
DEBPKG:debian/prefix_changes - Fiddle with PREFIX and variables written to the makefile
DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/517938 http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
DEBPKG:debian/mod_paths - Tweak @inc ordering for Debian
DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option [81/162]
DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/580034 Point users to Debian packages of deprecated core modules
DEBPKG:fixes/hurd-ccflags - [a190e64] http://bugs.debian.org/587901 [perl #92244] Make hints/gnu.sh append to $ccflags rather than overriding them
DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
DEBPKG:fixes/extutils-cbuilder-cflags - [011e8fb] http://bugs.debian.org/624460 [perl #89478] Append CFLAGS and LDFLAGS to their Config.pm counterparts in EU::C
Builder
DEBPKG:fixes/module-build-home-directory - http://bugs.debian.org/624850 [rt.cpan.org #67893] Fix failing tilde test when run under a UID without a passwd entry
DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.14.2-21+deb7u5 in patchlevel.h
DEBPKG:fixes/h2ph-multiarch - [e7ec705] http://bugs.debian.org/625808 [perl #90122] Make h2ph correctly search gcc include directories
DEBPKG:fixes/index-tainting - [3b36395] http://bugs.debian.org/291450 [perl #64804] RT 64804: tainting with index() of a constant
DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
DEBPKG:fixes/sys-syslog-socket-timeout-kfreebsd.patch - http://bugs.debian.org/627821 [rt.cpan.org #69997] Use a socket timeout on GNU/kFreeBSD to catch ICMP po
rt unreachable messages
DEBPKG:fixes/hurd-hints - http://bugs.debian.org/636609 Improve general GNU hints, needed for GNU/Hurd.
DEBPKG:fixes/pod_fixes - [7698aed] http://bugs.debian.org/637816 Fix typos in several pod/perl.pod files
DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
DEBPKG:fixes/digest_eval_hole - http://bugs.debian.org/644108 Close the eval "require $module" security hole in Digest->new($algorithm)
DEBPKG:fixes/hurd-ndbm - [f0d0a20] [perl #102680] http://bugs.debian.org/645989 Add GNU/Hurd hints for NDBM_File
DEBPKG:fixes/sysconf.t-posix - [8040185] [perl #102888] http://bugs.debian.org/646016 Fix hang in ext/POSIX/t/sysconf.t on GNU/Hurd [59/162]
DEBPKG:fixes/hurd-largefile - [1fda587] [perl #103014] http://bugs.debian.org/645790 enable LFS on GNU/Hurd
DEBPKG:debian/hurd_test_todo_syslog - http://bugs.debian.org/650093 Disable failing GNU/Hurd tests in cpan/Sys-Syslog/t/syslog.t
DEBPKG:fixes/hurd_skip_itimer_virtual - [rt.cpan.org #72754] http://bugs.debian.org/650094 Skip interval timer tests in Time::HiRes on GNU/Hurd
DEBPKG:debian/hurd_test_skip_sigdispatch - http://bugs.debian.org/650188 Disable failing GNU/Hurd tests op/sigdispatch.t
DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
DEBPKG:debian/hurd_test_skip_pipe - http://bugs.debian.org/650187 Disable failing GNU/Hurd tests io/pipe.t
DEBPKG:debian/hurd_test_skip_io_pipe - http://bugs.debian.org/650096 Disable failing GNU/Hurd tests dist/IO/t/io_pipe.t
DEBPKG:fixes/manpage_name_CPAN - http://bugs.debian.org/650448 [rt.cpan.org #73396] cpan/CPAN: add NAME headings in modules with POD
DEBPKG:fixes/manpage_name_CPANPLUS - http://bugs.debian.org/650450 [rt.cpan.org #73398] cpan/CPANPLUS: add NAME headings in modules with POD
DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
DEBPKG:fixes/manpage_name_Term-UI - http://bugs.debian.org/650452 [rt.cpan.org #73400] cpan/Term-UI: add NAME headings in modules with POD
DEBPKG:fixes/podlators_ae_ligature_fallback - http://bugs.debian.org/652851 Fix the ASCII fallback string for AE
DEBPKG:fixes/fsf_postal_address - [de89470] Update references to the FSF's postal address
DEBPKG:fixes/cpan_module_pod_fixes - [perl #106870] [rt.cpan.org #73447] [rt.cpan.org #73446] Fix POD formatting in Term-Cap and Pod-Parser
DEBPKG:fixes/cgi_no_shellwords_pl - Use Text::ParseWords instead of shellwords.pl
DEBPKG:fixes/path_max_fallback - [perl #109262] http://bugs.debian.org/656869 Don't use _POSIX_PATH_MAX as a fallback PATH_MAX
DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
DEBPKG:fixes/propagate_tainted_errors.patch - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=663158 [perl #111654] properly propagate tainted errors
DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
DEBPKG:fixes/socket_cache_propagate - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=659075 [rt.cpan.org #61577] [perl #112736] sockdomain and socktype undef on newly accepted sockets
DEBPKG:fixes/ipc_open3 - [perl #114454] http://bugs.debian.org/683894 IPC::Open3::open3(..., '-') broken [37/162]
DEBPKG:fixes/string_repeat_overrun - http://bugs.debian.org/689314 [b675304] avoid calling memset with a negative count
DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
DEBPKG:fixes/kfreebsd-overrides - http://bugs.debian.org/689713 [perl #115324] [7dc6565] Remove unnecessary overrides in gnukfreebsd and gnuknetbsd hints.
DEBPKG:fixes/tainted-smartmatch - [be88a5c] http://bugs.debian.org/690571 [perl #93590] $tainted ~~ [...] failing
DEBPKG:fixes/regexp-matching-starter - [6e634c5] http://bugs.debian.org/690975 [perl #101710] Regression with /i, latin1 chars.
DEBPKG:fixes/regexp-matching-fold - [399fb9c] http://bugs.debian.org/690976 regexec.c: Fix "\x{FB01}\x{FB00}" =~ /ff/i
DEBPKG:fixes/regexp-matching-opposite-case - [dc91d5a] http://bugs.debian.org/690979 [perl #101970] /[[:lower:]]/i matches upper case
DEBPKG:fixes/reading-glob-copy-handle - [fd1564b] http://bugs.debian.org/629363 [perl #92258] <$fh> hangs on a glob copy
DEBPKG:fixes/smartmatch-rhs-precedence - http://bugs.debian.org/691102 [011be0b] Enforce Any ~~ Object smartmatch precedence
DEBPKG:fixes/perlcheat-update - http://bugs.debian.org/691112 [ab0ae0a] Update PerlCheat to 5.14
DEBPKG:fixes/cgi-cr-escaping - http://bugs.debian.org/693420 CR escaping for P3P and Set-Cookie headers
DEBPKG:fixes/maketext-code-execution - [1735f6f] http://bugs.debian.org/695224 Fix misparsing of maketext strings.
DEBPKG:fixes/storable-security-warning - [664f237] http://bugs.debian.org/695223 add a note about security concerns in Storable
DEBPKG:fixes/digest-sha-doublefree - [rt.cpan.org #82655] http://bugs.debian.org/698172 [a8c6ff7] Fix a double-free bug in Digest::SHA
DEBPKG:fixes/64bitint-signedness-wraparound - http://bugs.debian.org/698320 [94e529c] Avoid wraparound when casting unsigned size_t to signed ssize_t.
DEBPKG:fixes/stdin-sigchld - http://bugs.debian.org/700171 [perl #116621] [be48bbe] add a couple missing LEAVEs in perlio_async_run()
DEBPKG:fixes/hsplit-rehash - [d59e31f] http://bugs.debian.org/702296 Prevent premature hsplit() calls, and only trigger REHASH after hsplit()
DEBPKG:fixes/encode-memleak - http://bugs.debian.org/702416 [5814803] Encode: Fixed a memory leak that occurred in the UTF-8 encoding.
DEBPKG:fixes/threads_shared_elements_crash - [perl #119089] http://bugs.debian.org/718438 threads::shared should not crash if shared elements outlive their aggr
egate.
DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
DEBPKG:fixes/digest_sha_double_free - [ee8c6f4] [rt.cpan.org #86295] http://bugs.debian.org/711206 maint-5.18: Digest-SHA crash fix in 5.85 [15/162]
DEBPKG:fixes/pl_eval_start_use_after_free - [eae139f] [perl #115992] PL_eval_start use-after-free
DEBPKG:fixes/regcomp_fix_segv - [ebb390a] [perl #115994] fix segv in regcomp.c:S_join_exact()
DEBPKG:fixes/list_util_off_by_two - [623a911] fix off-by-two error in List::Util
DEBPKG:fixes/sdbm_off_by_one - [7f5f08b] [perl #111586] sdbm.c: fix off-by-one access to global ".dir"
DEBPKG:fixes/socket_unpack_sockaddr_un_heap_buffer_overflow - [e508642] [perl #111594] Socket::unpack_sockaddr_un heap-buffer-overflow
DEBPKG:fixes/data_dump_infinite_recurse - [19be3be] don't recurse infinitely in Data::Dumper
DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 =?UTF-8?q?kFreeBSD=2010=20(possibly=20only=20with=20softupdates?= =?UTF-8?q?=20enabled)=20ma
y=20defer=0Acalculating=20the=20mtime=20for=20m?=
DEBPKG:fixes/CVE-2016-2381_duplicate_env - remove duplicate environment variables from environ
DEBPKG:fixes/CVE-2016-1238/remove-dot-when-loading - [perl #127834] (perl #127834) remove . from the end of @inc if complex modules are loaded
DEBPKG:fixes/CVE-2016-1238/remove-dot-in-padwalker - [perl #127834] perl5db.pl: ensure PadWalker is loaded from standard paths
DEBPKG:fixes/CVE-2016-1238/remove-dot-in-dist - [perl #127834] dist/: remove . from @inc when loading optional modules
DEBPKG:fixes/CVE-2016-1238/remove-dot-in-cpan - [perl #127834] cpan/: remove . from @inc when loading optional modules
DEBPKG:debian/CVE-2016-1238/test-suite-without-dot - [perl #127810] Patch unit tests to explicitly insert "." into @inc when needed.
DEBPKG:debian/CVE-2016-1238/eumm-without-dot - [perl #127810] Add PERL_USE_UNSAFE_INC support to EU::MM for fortify_inc support.
DEBPKG:debian/CVE-2016-1238/cpan-without-dot - [perl #127810] Set PERL_USE_UNSAFE_INC for cpan usage
DEBPKG:debian/CVE-2016-1238/sitecustomize-in-etc - Look for sitecustomize.pl in /etc/perl rather than sitelib on Debian systems
DEBPKG:debian/CVE-2016-1238/mb-without-dot - Make Module::Build set PERL_USE_UNSAFE_INC
DEBPKG:debian/CVE-2016-1238/remove-inc-test - Remove test for '.' in @inc as it might not be
DEBPKG:fixes/xsloader-eval - [rt.cpan.org #115808] https://bugs.debian.org/829578 =?UTF-8?q?Don=E2=80=99t=20let=20XSLoader=20load=20relative=20path?= =?UTF-8?q?
s?=
DEBPKG:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack.
DEBPKG:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294)
Built under linux
Compiled at Jun 5 2017 18:24:09
@inc:
/etc/perl
/usr/local/lib/perl/5.14.2
/usr/local/share/perl/5.14.2
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.14
/usr/share/perl/5.14
/usr/local/lib/site_perl
.

Error: cannot read from HMM file

Hi I tried to run PennCNV with the following command line and error message below. The message says cannot find hmm file but I am pretty sure HMM is in my current folder.
my version of software is: PennCNV 1.04 Perl 5.8.7
Thank you

command line:
detect_cnv.pl -test -hmm affygw6.hmm -pfb affy_LAT_cnv_v0.pfb --chrx --gcmodelfile file.gcmodel -conf -log affy_lat_cnv_chrX.log -out affy_lat_rawcnv_chrX.txt --listfile cnv_filelist.txt

the output error is shown below:
NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file affy_lat_cnv_chrX.log
NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from affy_LAT_cnv_v0.pfb ... Done with 816889 records (115 records in chr MT were discarded)
NOTICE: Reading LRR and BAF values for from affy_cnv/261_0261-1_V3_D270_8000000628_F09_LAT.cnv.txt ... Done with 816889 records in 24 chromosomes (115 records are discarded due to lack of PFB information for the markers)
NOTICE: Adjusting LRR by GC model: WF changes from -0.0513 to -0.0291, GCWF changes from -0.0459 to -0.0165
NOTICE: Data from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,Y will not be used in analysis
NOTICE: quality summary for affy_cnv/261_0261-1_V3_D270_8000000628_F09_LAT.cnv.txt: LRR_Xmean=0.0174 LRR_Xmedian=0.0325 LRR_XSD=0.4295 BAF_Xhet=0.2941
NOTICE: Sample sex for affy_cnv/261_0261-1_V3_D270_8000000628_F09_LAT.cnv.txt is predicted as 'female' based on BAF heterozygosity rate (this is different from genotype heterozygosity rate!) for chrX (0.294122300450914)
WARNING: Sample from affy_cnv/261_0261-1_V3_D270_8000000628_F09_LAT.cnv.txt does not pass quality control criteria due to its large SD for LRR (0.429494920336252)!
WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution!
kc.pm module run-time error (use eval{} in Perl to catch the error) ...
Error: cannot read from HMM file at detect_cnv.pl line 850.

a few lines in the result file .rawcnv

Dear Dr. Wang
I'm using PennCNV to perform CNV calling from Affymetrix 600K Axiom Chicken Genotyping Array, the commond is as below: perl detect_cnv.pl -test -hmm affygw6.hmm -pfb ly.pfb -list filelist.txt -lastchr 28 -log ly.log -out ly.rawcnv ,it runs successfully, however, I'm a little confused that only ~70 lines generated in the result file ly.rawcnv, it means only 70 CNVs detected which is quite less than other reports. I'm not sure about the result, do you have any suggestions for the result? And another problem if I want to use GCModel how can I get GCModel file?
Many thanks!
Dong

Batch effects?

Dear Penn developers,
I am working on an Illumina array dataset. I have two batches of data which were run using same SNP chip chemistry (OncoArray) and sequencing centre, but processed a year apart.

I have identified strong batch affects in my data, which manifest as differences in the range of the LRR values for the two batches. For instance, for some true/known CNVs, when I look at the LRR values, I can see a shift in the LRR values suggesting the presence of a CNV, however the values are not high enough to cross the CN2 threshold required to be called by Penn.

The LRR values were generated in Genomestudio and then exported for analysis in Penn.

Is this something you have encountered and do you have any suggestions for how I might 'train' the HMM to account for these effects?

Thanks.

missing lib/hh550.hg18.gcmodel

Hi,
The examples folder has a number of tests to validate a build.
The example number 3 is missing lib/hh550.hg18.gcmodel

FYI: compiler compatibility issue

FYI, building under RH7/CentOS7 works fine but results in a build which does not work. The issue seems to be related to the default compiler (gcc 4.8.5). Using the compatibility compiler (gcc44 4.4.7) instead produces a build which works (tested by comparing the results of examples 1,2, 4-6, 13-16 the output for which are provided).

Note that this is using Perl 5.16.3 (default on RH7) and may be related to the so-called "Perl compatibility issue" reported elsewhere.

Obtain Population frequency of B allele Axiom_UKB_WCSG

Hello Dr. Wang,

I am trying to use PennCNV with Axiom_UKB_WCSG array but not sure which one is the right PFB file used in step 1.3 http://penncnv.openbioinformatics.org/en/latest/user-guide/affy/.

I checked file "Axiom_GW_HumanOrigin.hg38" provided with pennCNV but when I overlapped SNP IDs with the one in annotation files from Axio it only overlap 70K ids.

I have also downloaded a file (Axiom_UKB_WCSG.na35.af_supp.tab) from affemetrix website that contains allele frequencies from "1KGp1_mar2012 and Affymetrix_internal_screen". The file contains frequencies from various populations , below are few column names from header

Probe Set ID Affy SNP ID Freq_A ASW Freq_B ASW Heterozygosity ASW Number ASW Minor_Allele ASW MAF ASW Freq_A CEU Freq_B CEU

Is this the right file to obtain B allele frequencies for PBF file ?

Best,
Nick

Unable to adjust LRR values by GC model due to lack of GCWF measure

Dear dr. Wang,

I am using PennCNV to call CNVs for Illumina BovineHD datasets as below:
detect_cnv.pl -test -hmm $hmm -pfb $pfb --lastchr 29 -gcmodel $gc $input -log sampleall.log -out sampleall.rawcnv

where,
$hmm <- hhall.hmm file provided by PennCNV
$pfb <- generated for my own population, based on 'compile_pfb.pl' script
$gc <- gc model generated using 'cal_gc_snp.pl'
$input <- input data

Unfortunately, I encountered an error message: Unable to adjust LRR values by GC model due to lack of GCWF measure

I don't know whether this information will help, but the input file for 'cal_gc_snp.pl', 'gc5Base.txt' is not available currently for the reference genome I am using (UCD_ARS1.2), so I made the input file myself, with some help from UCSC bioinformaticians. I can add the detail of how I did if need be.

Could you please explain me how can I solve this problem?
Many thanks in advance! :)

Lim

PennCNV for PIG DATA

Dear Dr. Wang,
I used the PennCNV in my PIG DATA and I have some doubts. My SNP chip has 50,915 SNPs (GGP_Porcine_HD_E) from illumina plataform. Could you help me, please?

I used genomic_wave.pl for create my adjusted list. So I used the command bellow:

./detect_cnv.pl -test -hmm lib/hhall.hmm -pfb hernia_escrotal.pfb -confidence -minlength 1000 -log detect_cnv_com_minlength.log -list lista_adj.txt -out allsample_adjusted_com_minlength.rawcnv

My doubts are:

Do not I need use the -gcmodel for adjustment the signal file in the command above, right?

My hmm file is it correct? I used hhall.hmm file for cnv call.

Thank you so much!

Missing files in lib folder?

I try to use PennCNV for SNP array analysis/ CNV detection. But I miss some files, following the documentation.
The documentation says:
"In PennCNV, the hhall.hg18.pfb, hhall.hmm and hhall.hg18.gcmodel files have been provided in the lib/ folder. "

Also the 3 files for hh550 should be there I guess. But I only find the .hmm files. Did you remove them? And where can I download these files?

different CNVs between two execution

Hello,
I have executed the detect_cnv.pl script several months ago and got CNVs based on PennCNV-1.0.4. Then now, I added the command -confidence and ran the same samples again based on PennCNV-1.0.5, but got completely different CNVs. Is this because of the version of PennCNV? or I need to set seed like in R?
Thank you in advance!

RuntimeError Usage: fisher_exact_2sided

Hi,

I'm trying to run a CNV case-control comparison with detect_cnv.pl -cctest and I've got the following error:
RuntimeError Usage: fisher_exact_2sided(a,b,c,d); at detect_cnv.pl line 1246, line 576

Can you help me with this error?
Thanks.
Vanessa.

custom pfb - duplicate values

Hey,

I am trying to create my own pfb for Illumina GSA. I compiled the pfb using 500 samples. The output is quite big and I noticed that there are a lot of duplicates, with different baf values. How should I deal with the duplicates? Do they somehow affect the next step detect_cnv? Can I remove them?

gender prediction

Given a list of samples, some of which have unknown gender, it should predict gender automatically and produce calls for all samples.

Visualization error

Hi,
I am trying to visualize cnv calls from example1 of PennCNV but I have two problems. First, when I run visualize_cnv.pl -format plot -signal offspring.txt ex1.rawcnv I get this output

NOTICE: Signal values for 4 CNV regions are found in offspring.txt
NOTICE: Processing sample offspring.txt CNV chr11:55603545-55669650 with copy number of 0 ... written to offspring.txt.chr11.55603545.PDF
NOTICE: Processing sample offspring.txt CNV chr11:81792950-81806219 with copy number of 1 ... Use of uninitialized value in numeric gt (>) at C:\Users\andsav\Desktop\PennCNV-1.0.5\PennCNV-1.0.5\visualize_cnv.pl line 204.
written to offspring.txt.chr11.81792950.PDF
NOTICE: Processing sample offspring.txt CNV chr20:10511631-10583260 with copy number of 1 ... written to offspring.txt.chr20.10511631.PDF
NOTICE: Processing sample offspring.txt CNV chr3:3957986-4054960 with copy number of 1 ... written to offspring.txt.chr3.3957986.PDF

I wonder, what does this mean Use of uninitialized value in numeric gt (>) at C:\Users\andsav\Desktop\PennCNV-1.0.5\PennCNV-1.0.5\visualize_cnv.pl line 204. and how does it affect me?

Then, when I want to look at the generated plots, I was under the impresion that they will exist in jpg format in the folder. instead I have this

If I open one of them using Rstudio, it gives me the followin error
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: file ‘.RData’ has magic number 'RDX3'
Use of save versions prior to 2 is deprecated
2: In file(file, "rt") :
cannot open file 'offspring.txt.chr11.81792950.signal': No such file or directory

Any ideas on what could be wrong and what I need to do to fix this?

System INFO
I use windows 10 x64, installed ActivePerl 5.8 x32 and have R 3.6.0

Segmentation fault-detect_cnv.pl

hi, I want to call CNV for affymetrix SNP6.0 by PennyCNV, but I run the final step-detect_cnv.pl, i meet the problem as below.
the code: ./detect_cnv.pl -test -hmm ./gw6/lib/affygw6.hmm -pfb ./gw6/lib/affygw6.hg19.pfb NR.B01_181507A_NR1_R1 -log NR.log -out NR.rawcn

the log: "NOTICE: All program notification/warning messages that appear in STDERR will be also written to log file NR.log
NOTICE: Reading marker coordinates and population frequency of B allele (PFB) from /pnas/liuxin_group/moshl/software/gw6/lib/affygw6.hg19.pfb ... Done with 1878054 records
NOTICE: Reading LRR and BAF values for from NR.B01_181507A_NR1_R1 ... Done with 1847835 records in 24 chromosomes
NOTICE: Data from chromosome X,Y will not be used in analysis
NOTICE: Median-adjusting LRR values for all autosome markers from NR.B01_181507A_NR1_R1 by -0.0606
NOTICE: Median-adjusting BAF values for all autosome markers from NR.B01_181507A_NR1_R1 by 0.0008
NOTICE: quality summary for NR.B01_181507A_NR1_R1: LRR_mean=0.0174 LRR_median=0.0000 LRR_SD=0.4211 BAF_mean=0.4993 BAF_median=0.5000 BAF_SD=0.0870 BAF_DRIFT=0.009043 WF=-0.0390 GCWF=-0.0141
WARNING: Sample from NR.B01_181507A_NR1_R1 does not pass default quality control criteria due to its large SD for LRR (0.421119804166937)!
WARNING: Sample from NR.B01_181507A_NR1_R1 does not pass default quality control criteria due to its drifting BAF values (drift=0.00904313006810173)!
WARNING: Small-sized CNV calls may not be reliable and should be interpreted with caution!
Segmentation fault "
I try to the problem by advise from "detect_cnv.pl resulting in "Segmentation fault: 11" #7", but the problem still be existed. Looking forward to your reply.
p.s version of my perl is v5.16.3.

bestwishes.
mo

Is it necessary to train HMM file for exome arrays

Hi Dr. Wang,

I noticed in the help message of the 'detect_cnv.pl', training optimized .hmm model parameters is not recommended. But I guess it would be better to train the model when the data is from exome arrays ?

I have downloaded the HMM file for Illumina HumanCoreExome_v12-A beadchip constrcuted by Szatkiewicz et al from your website. and had run the detection operation with both hmm files (i.e. hhall.hmm and exome.hmm).

With exome.hmm, more CNVs were detected.

I have noticed the only difference in two files lie in the B1 block which is the mean LRR and sd.

The data I'm analysing were from InfiniumCoreExome-24v1-1_A, so I'm not sure whether I should re-train the HMM model or just use the exome.hmm as below is good enough ?

Thank you very much

Best regards,
Ruqian

hhall.hmm
M=6
N=6
A:
0.936719716 0.006332139 0.048770575 0.000000001 0.008177573 0.000000001
0.000801036 0.949230924 0.048770575 0.000000001 0.001168245 0.000029225
0.000004595 0.000047431 0.999912387 0.000000001 0.000034971 0.000000621
0.000049998 0.000049998 0.000049998 0.999750015 0.000049998 0.000049998
0.000916738 0.001359036 0.048770575 0.000000001 0.948953653 0.000000002
0.000000001 0.000000001 0.027257213 0.000000001 0.000000004 0.972742785
B:
0.950000 0.000001 0.050000 0.000001 0.000001 0.000001
0.000001 0.950000 0.050000 0.000001 0.000001 0.000001
0.000001 0.000001 0.999995 0.000001 0.000001 0.000001
0.000001 0.000001 0.050000 0.950000 0.000001 0.000001
0.000001 0.000001 0.050000 0.000001 0.950000 0.000001
0.000001 0.000001 0.050000 0.000001 0.000001 0.950000
pi:
0.000001 0.000500 0.999000 0.000001 0.000500 0.000001
B1_mean:
-3.527211 -0.664184 0.000000 100.000000 0.395621 0.678345
B1_sd:
1.329152 0.284338 0.159645 0.211396 0.209089 0.191579
B1_uf:
0.010000
B2_mean:
0.000000 0.250000 0.333333 0.500000 0.500000
B2_sd:
0.016372 0.042099 0.045126 0.034982 0.304243
B2_uf:
0.010000
B3_mean:
-2.051407 -0.572210 0.000000 0.000000 0.361669 0.626711
B3_sd:
2.132843 0.382025 0.184001 0.200297 0.253551 0.353183
B3_uf:
0.010000

exome.hmm

M=6
N=6
A:
0.936719716 0.006332139 0.048770575 0.000000001 0.008177573 0.000000001
0.000801036 0.949230924 0.048770575 0.000000001 0.001168245 0.000029225
0.000004595 0.000047431 0.999912387 0.000000001 0.000034971 0.000000621
0.000049998 0.000049998 0.000049998 0.999750015 0.000049998 0.000049998
0.000916738 0.001359036 0.048770575 0.000000001 0.948953653 0.000000002
0.000000001 0.000000001 0.027257213 0.000000001 0.000000004 0.972742785
B:
0.950000 0.000001 0.050000 0.000001 0.000001 0.000001
0.000001 0.950000 0.050000 0.000001 0.000001 0.000001
0.000001 0.000001 0.999995 0.000001 0.000001 0.000001
0.000001 0.000001 0.050000 0.950000 0.000001 0.000001
0.000001 0.000001 0.050000 0.000001 0.950000 0.000001
0.000001 0.000001 0.050000 0.000001 0.000001 0.950000
pi:
0.000001 0.000500 0.999000 0.000001 0.000500 0.000001
B1_mean:
-2.051407 -0.5 0.000000 100.000000 0.32 0.62
B1_sd:
1.329152 0.17 0.159645 0.211396 0.25 0.30
B1_uf:
0.010000
B2_mean:
0.000000 0.250000 0.333333 0.500000 0.500000
B2_sd:
0.016372 0.042099 0.045126 0.034982 0.304243
B2_uf:
0.010000
B3_mean:
-2.051407 -0.572210 0.000000 0.000000 0.361669 0.626711
B3_sd:
2.132843 0.382025 0.184001 0.200297 0.253551 0.353183
B3_uf:
0.010000

Error: NO SIGNAL DATA FOUND IN INPUTFILE

Hi there,
I'm trying to execute the detect_cnv.pl script and I keep coming across the same error. It reads:
NOTICE: Reading LRR and BAF values for from gw6.H11_IB58_Axiom_Char_P040_1051729031 ... Done with 0 records in 0 chromosomes (13664 records are discarded due to lack of PFB information for the markers)
ERROR: NO SIGNAL DATA FOUND IN INPUTFILE gw6.H11_IB58_Axiom_Char_P040_1051729031
WARNING: Skipping gw6.H11_IB58_Axiom_Char_P040_1051729031 since no signal values can be retrieved from the file
My pfb and signal files have values for pfb, log r ratio, and b allele freq, but the files aren't being read properly. I'm wondering if you've come across this issue before?
Thank you in advance!

CNV plot margins

Hello, I wonder if you can help me with this. I am trying to change the code of visualize_cnv.pl so that the CNV plots dedicate bigger portion to the actual CNV rather than the margins.

I tried changing the following chunk

my $length = $end-$start+1;
$length < $flankinglength and $length = $flankinglength;
if ($pos >= $start-$length and $pos <= $end+$length)
 {	push @{$signal{$chr, $start}}, [$pos, $lrr, $baf]; }

into this

my $length = $end-$start+1;
$temp = round($length / 3);
if ($pos >= $start-$temp and $pos <= $end+$temp) {
      push @{$signal{$chr, $start}}, [$pos, $lrr, $baf];}

So for all figures, I want the left and right margins to be equal to 1/3 of the cnv area.
While some of the figures turn out okay, others have unequal margins.

Does anyone have experience with this or maybe has some advice on what might be wrong? I would appreciate any help, thank you!

hmm file for custom designed array--illumina

Hi Dr. Wang,

I want to use PennCNV to call CNV from a custom designed pig Illumian array ,this array were designed in 2017.
My problem is that in the CNV calling step, I do not HMM file.
Can I just use the hhall.hmm for no human species Illumina custom designed array?

Best,
Lichen

LOH detection

It's me again :) I'm curious why the LOH detection is not a feature anymore and if it has ever been for Affymetrix chips. I feel like there is a real lack of software for this kind of analysis.

PennCNV affy (Genome-wide 6.0 array) apt-probeset-genotype command

I used the following command,
$ bin/apt-probeset-genotype -c lib/GenomeWideSNP_6.cdf -a birdseed --read-models-birdseed lib/GenomeWideSNP_6.birdseed.models --special-snps lib/GenomeWideSNP_6.specialSNPs --out-dir apt --cel-files CEL/*.CEL

However got the following error message
FATAL ERROR:TsvFile.cpp:2659: This file is a Calvin file. (filename='CEL/1_Aldo_PBL_Lymph.CEL')

Normal/Neutral copy numbers (cn=2)? -Axiom_array(PMRA)

RE: PennCNV-affy calling - axiom PMRA array data

Kindly, I just successfully ran PennCNV-1.0.3 "detect_cnv.pl" to call cnvs. I do however noticed that the generated copy numbers are either cn=1,0,or 3 and no 2(neutral/normal). Kindly how or what flag do I need to invoke in order get all segments including the neutral/normal ones?

Thank you

Version 1.0.4 gives the Segmentation fault for khmm.so

This is the same issue as #7. (submitting new as it is closed)

Version 1.0.4 gives the Segmentation fault for khmm.so, tried with Perl 5.8.9 and gcc4.9 on CentOS 6.8. No luck (tried few other combinations as well). I am installing this on our cluster to be used by the University researchers and this has wasted a lot of my time.

wglab / penncnv Goto Github PK

penncnv's Introduction

Introduction to the PennCNV software

Reference

penncnv's People

Contributors

Stargazers

Watchers

Forkers

penncnv's Issues

Recommend Projects

Recommend Topics

Recommend Org