Code Monkey home page Code Monkey logo

Comments (15)

lima1 avatar lima1 commented on August 20, 2024

If you read the intervals with read.delim, do you get a proper "Gene" column?

The current developer version (the new stable in 3 weeks) automatically adds proper gene symbols. I would recommend starting from scratch with this new version. You can upgrade with BiocInstaller::biocLite("lima1/PureCN").

Make sure to follow https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.pdf

There are a few things in your results that needs fixing:

  • You have a huge AT dropout in the tumor. If you ran the GC normalization, this usually happens when there are problems with the interval file. The new version will fix some of the issues automatically.
  • Without a Mutect VCF, you won't allele-specific copy numbers and the ploidy inference is unreliable.
  • Provide coverage on chrY to get sample sex.

from purecn.

rframal avatar rframal commented on August 20, 2024

from purecn.

rframal avatar rframal commented on August 20, 2024

About your previous question. Yes, the read.delim get the "Gene" column. Please check below.

head(read.delim("/home/rramalho/beds/3050941_Covered_V2.bed"))
chr1 X2488047 X2488227 TNFRSF14
1 chr1 2489098 2489338 TNFRSF14
2 chr1 2489724 2489964 TNFRSF14
3 chr1 2491219 2491459 TNFRSF14
4 chr1 2492017 2492197 TNFRSF14
5 chr1 2492908 2493302 TNFRSF14
6 chr1 2494181 2494361 TNFRSF14

head(bed[,4])
[1] TNFRSF14 TNFRSF14 TNFRSF14 TNFRSF14 TNFRSF14 TNFRSF14

I just run the devel-version (1.7.56) however the gene call still does not work (please check the log below).

INFO [2017-10-10 11:50:29] ------------------------------------------------------------
INFO [2017-10-10 11:50:29] PureCN 1.7.56
INFO [2017-10-10 11:50:29] ------------------------------------------------------------
INFO [2017-10-10 11:50:29] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file ../PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal._chrALL_sorted.vcf -normalDB -genome hg19 -sex ? -args.filterVcf NULL,c(0.03, 0.97),NULL -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-10 11:50:29] Loading coverage files...
INFO [2017-10-10 11:50:32] Mean target coverages: 1141X (tumor) 78X (normal).
WARN [2017-10-10 11:50:32] Allosome coverage missing, cannot determine sex.
WARN [2017-10-10 11:50:32] Allosome coverage missing, cannot determine sex.
INFO [2017-10-10 11:50:33] Removing 23 low/high GC targets.
INFO [2017-10-10 11:50:33] Removing 231 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-10 11:50:33] No normalDB provided. Provide one for better results.
INFO [2017-10-10 11:50:33] Removing 18085 low coverage (< 15.0000X) targets.
INFO [2017-10-10 11:50:33] Using 6634 intervals (6633 on-target, 1 off-target).
INFO [2017-10-10 11:50:33] AT/GC dropout: 1.07 (tumor), 0.68 (normal).
WARN [2017-10-10 11:50:33] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-10 11:50:33] Loading VCF...
INFO [2017-10-10 11:50:55] ------------------------------------------------------------
INFO [2017-10-10 11:50:55] PureCN 1.7.56
INFO [2017-10-10 11:50:55] ------------------------------------------------------------
INFO [2017-10-10 11:50:55] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file -normalDB -genome hg19 -sex ? -args.filterVcf NULL,c(0.03, 0.97),NULL -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-10 11:50:55] Loading coverage files...
INFO [2017-10-10 11:50:57] Mean target coverages: 1141X (tumor) 78X (normal).
WARN [2017-10-10 11:50:57] Allosome coverage missing, cannot determine sex.
WARN [2017-10-10 11:50:57] Allosome coverage missing, cannot determine sex.
INFO [2017-10-10 11:50:58] Removing 23 low/high GC targets.
INFO [2017-10-10 11:50:58] Removing 231 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-10 11:50:58] No normalDB provided. Provide one for better results.
INFO [2017-10-10 11:50:58] Removing 18085 low coverage (< 15.0000X) targets.
INFO [2017-10-10 11:50:58] Using 6634 intervals (6633 on-target, 1 off-target).
INFO [2017-10-10 11:50:58] AT/GC dropout: 1.07 (tumor), 0.68 (normal).
WARN [2017-10-10 11:50:58] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-10 11:50:58] Sample sex: ?
INFO [2017-10-10 11:50:58] Segmenting data...
INFO [2017-10-10 11:50:58] Setting undo.SD parameter to 1.250000.
INFO [2017-10-10 11:51:01] Mean standard deviation of log-ratios: 0.79
INFO [2017-10-10 11:51:01] 2D-grid search of purity and ploidy...
INFO [2017-10-10 11:51:10] Local optima: 0.48/6, 0.48/5.4, 0.52/5, 0.82/6, 0.55/4.6, 0.58/4.2,
0.65/3.6, 0.68/3.2, 0.72/2.8, 0.78/2.2, 0.82/1.9, 0.88/1.2, 0.65/6,
0.27/2, 0.52/2
INFO [2017-10-10 11:51:10] Testing local optimum 1/15 at purity 0.48 and total ploidy 6.00...
INFO [2017-10-10 11:51:10] Recalibrating log-ratios...
INFO [2017-10-10 11:51:10] Testing local optimum 1/15 at purity 0.48 and total ploidy 6.00...
INFO [2017-10-10 11:51:10] Recalibrating log-ratios...
INFO [2017-10-10 11:51:10] Testing local optimum 1/15 at purity 0.48 and total ploidy 6.00...
INFO [2017-10-10 11:51:10] Recalibrating log-ratios...
INFO [2017-10-10 11:51:10] Testing local optimum 1/15 at purity 0.48 and total ploidy 6.00...
INFO [2017-10-10 11:51:11] Testing local optimum 2/15 at purity 0.48 and total ploidy 5.40...
INFO [2017-10-10 11:51:11] Recalibrating log-ratios...
INFO [2017-10-10 11:51:11] Testing local optimum 2/15 at purity 0.48 and total ploidy 5.40...
INFO [2017-10-10 11:51:11] Recalibrating log-ratios...
INFO [2017-10-10 11:51:11] Testing local optimum 2/15 at purity 0.48 and total ploidy 5.40...
INFO [2017-10-10 11:51:11] Recalibrating log-ratios...
INFO [2017-10-10 11:51:11] Testing local optimum 2/15 at purity 0.48 and total ploidy 5.40...
INFO [2017-10-10 11:51:12] Testing local optimum 3/15 at purity 0.52 and total ploidy 5.00...
INFO [2017-10-10 11:51:12] Recalibrating log-ratios...
INFO [2017-10-10 11:51:12] Testing local optimum 3/15 at purity 0.52 and total ploidy 5.00...
INFO [2017-10-10 11:51:12] Recalibrating log-ratios...
INFO [2017-10-10 11:51:12] Testing local optimum 3/15 at purity 0.52 and total ploidy 5.00...
INFO [2017-10-10 11:51:12] Recalibrating log-ratios...
INFO [2017-10-10 11:51:12] Testing local optimum 3/15 at purity 0.52 and total ploidy 5.00...
INFO [2017-10-10 11:51:12] Testing local optimum 4/15 at purity 0.82 and total ploidy 6.00...
INFO [2017-10-10 11:51:13] Recalibrating log-ratios...
INFO [2017-10-10 11:51:13] Testing local optimum 4/15 at purity 0.82 and total ploidy 6.00...
INFO [2017-10-10 11:51:13] Recalibrating log-ratios...
INFO [2017-10-10 11:51:13] Testing local optimum 4/15 at purity 0.82 and total ploidy 6.00...
INFO [2017-10-10 11:51:13] Recalibrating log-ratios...
INFO [2017-10-10 11:51:13] Testing local optimum 4/15 at purity 0.82 and total ploidy 6.00...
INFO [2017-10-10 11:51:13] Testing local optimum 5/15 at purity 0.55 and total ploidy 4.60...
INFO [2017-10-10 11:51:14] Recalibrating log-ratios...
INFO [2017-10-10 11:51:14] Testing local optimum 5/15 at purity 0.55 and total ploidy 4.60...
INFO [2017-10-10 11:51:14] Recalibrating log-ratios...
INFO [2017-10-10 11:51:14] Testing local optimum 5/15 at purity 0.55 and total ploidy 4.60...
INFO [2017-10-10 11:51:14] Recalibrating log-ratios...
INFO [2017-10-10 11:51:14] Testing local optimum 5/15 at purity 0.55 and total ploidy 4.60...
INFO [2017-10-10 11:51:15] Testing local optimum 6/15 at purity 0.58 and total ploidy 4.20...
INFO [2017-10-10 11:51:15] Recalibrating log-ratios...
INFO [2017-10-10 11:51:15] Testing local optimum 6/15 at purity 0.58 and total ploidy 4.20...
INFO [2017-10-10 11:51:15] Recalibrating log-ratios...
INFO [2017-10-10 11:51:15] Testing local optimum 6/15 at purity 0.58 and total ploidy 4.20...
INFO [2017-10-10 11:51:15] Recalibrating log-ratios...
INFO [2017-10-10 11:51:15] Testing local optimum 6/15 at purity 0.58 and total ploidy 4.20...
INFO [2017-10-10 11:51:16] Testing local optimum 7/15 at purity 0.65 and total ploidy 3.60...
INFO [2017-10-10 11:51:16] Recalibrating log-ratios...
INFO [2017-10-10 11:51:16] Testing local optimum 7/15 at purity 0.65 and total ploidy 3.60...
INFO [2017-10-10 11:51:16] Recalibrating log-ratios...
INFO [2017-10-10 11:51:16] Testing local optimum 7/15 at purity 0.65 and total ploidy 3.60...
INFO [2017-10-10 11:51:16] Recalibrating log-ratios...
INFO [2017-10-10 11:51:16] Testing local optimum 7/15 at purity 0.65 and total ploidy 3.60...
INFO [2017-10-10 11:51:17] Testing local optimum 8/15 at purity 0.68 and total ploidy 3.20...
INFO [2017-10-10 11:51:17] Recalibrating log-ratios...
INFO [2017-10-10 11:51:17] Testing local optimum 8/15 at purity 0.68 and total ploidy 3.20...
INFO [2017-10-10 11:51:17] Recalibrating log-ratios...
INFO [2017-10-10 11:51:17] Testing local optimum 8/15 at purity 0.68 and total ploidy 3.20...
INFO [2017-10-10 11:51:17] Recalibrating log-ratios...
INFO [2017-10-10 11:51:17] Testing local optimum 8/15 at purity 0.68 and total ploidy 3.20...
INFO [2017-10-10 11:51:18] Testing local optimum 9/15 at purity 0.72 and total ploidy 2.80...
INFO [2017-10-10 11:51:18] Recalibrating log-ratios...
INFO [2017-10-10 11:51:18] Testing local optimum 9/15 at purity 0.72 and total ploidy 2.80...
INFO [2017-10-10 11:51:18] Recalibrating log-ratios...
INFO [2017-10-10 11:51:18] Testing local optimum 9/15 at purity 0.72 and total ploidy 2.80...
INFO [2017-10-10 11:51:18] Recalibrating log-ratios...
INFO [2017-10-10 11:51:18] Testing local optimum 9/15 at purity 0.72 and total ploidy 2.80...
INFO [2017-10-10 11:51:18] Testing local optimum 10/15 at purity 0.78 and total ploidy 2.20...
INFO [2017-10-10 11:51:19] Recalibrating log-ratios...
INFO [2017-10-10 11:51:19] Testing local optimum 10/15 at purity 0.78 and total ploidy 2.20...
INFO [2017-10-10 11:51:19] Recalibrating log-ratios...
INFO [2017-10-10 11:51:19] Testing local optimum 10/15 at purity 0.78 and total ploidy 2.20...
INFO [2017-10-10 11:51:19] Recalibrating log-ratios...
INFO [2017-10-10 11:51:19] Testing local optimum 10/15 at purity 0.78 and total ploidy 2.20...
INFO [2017-10-10 11:51:19] Testing local optimum 11/15 at purity 0.82 and total ploidy 1.90...
INFO [2017-10-10 11:51:20] Recalibrating log-ratios...
INFO [2017-10-10 11:51:20] Testing local optimum 11/15 at purity 0.82 and total ploidy 1.90...
INFO [2017-10-10 11:51:20] Recalibrating log-ratios...
INFO [2017-10-10 11:51:20] Testing local optimum 11/15 at purity 0.82 and total ploidy 1.90...
INFO [2017-10-10 11:51:20] Recalibrating log-ratios...
INFO [2017-10-10 11:51:20] Testing local optimum 11/15 at purity 0.82 and total ploidy 1.90...
INFO [2017-10-10 11:51:20] Testing local optimum 12/15 at purity 0.88 and total ploidy 1.20...
INFO [2017-10-10 11:51:20] Recalibrating log-ratios...
INFO [2017-10-10 11:51:20] Testing local optimum 12/15 at purity 0.88 and total ploidy 1.20...
INFO [2017-10-10 11:51:21] Recalibrating log-ratios...
INFO [2017-10-10 11:51:21] Testing local optimum 12/15 at purity 0.88 and total ploidy 1.20...
INFO [2017-10-10 11:51:21] Recalibrating log-ratios...
INFO [2017-10-10 11:51:21] Testing local optimum 12/15 at purity 0.88 and total ploidy 1.20...
INFO [2017-10-10 11:51:21] Testing local optimum 13/15 at purity 0.65 and total ploidy 6.00...
INFO [2017-10-10 11:51:21] Recalibrating log-ratios...
INFO [2017-10-10 11:51:21] Testing local optimum 13/15 at purity 0.65 and total ploidy 6.00...
INFO [2017-10-10 11:51:22] Recalibrating log-ratios...
INFO [2017-10-10 11:51:22] Testing local optimum 13/15 at purity 0.65 and total ploidy 6.00...
INFO [2017-10-10 11:51:22] Recalibrating log-ratios...
INFO [2017-10-10 11:51:22] Testing local optimum 13/15 at purity 0.65 and total ploidy 6.00...
INFO [2017-10-10 11:51:22] Testing local optimum 14/15 at purity 0.27 and total ploidy 2.00...
INFO [2017-10-10 11:51:22] Recalibrating log-ratios...
INFO [2017-10-10 11:51:22] Testing local optimum 14/15 at purity 0.27 and total ploidy 2.00...
INFO [2017-10-10 11:51:22] Recalibrating log-ratios...
INFO [2017-10-10 11:51:22] Testing local optimum 14/15 at purity 0.27 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Recalibrating log-ratios...
INFO [2017-10-10 11:51:23] Testing local optimum 14/15 at purity 0.27 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Testing local optimum 15/15 at purity 0.52 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Recalibrating log-ratios...
INFO [2017-10-10 11:51:23] Testing local optimum 15/15 at purity 0.52 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Recalibrating log-ratios...
INFO [2017-10-10 11:51:23] Testing local optimum 15/15 at purity 0.52 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Recalibrating log-ratios...
INFO [2017-10-10 11:51:23] Testing local optimum 15/15 at purity 0.52 and total ploidy 2.00...
INFO [2017-10-10 11:51:23] Done.
INFO [2017-10-10 11:51:23] ------------------------------------------------------------

gene.calls <- callAlterations(HORIZON)
FATAL [2017-10-10 12:29:08] This function requires gene-level calls. Please add a column 'Gene'

FATAL [2017-10-10 12:29:08] containing gene symbols to the gc.gene.file.

FATAL [2017-10-10 12:29:08]

FATAL [2017-10-10 12:29:08] This is most likely a user error due to invalid input data or

FATAL [2017-10-10 12:29:08] parameters (PureCN 1.7.56).

Erro: This function requires gene-level calls. Please add a column 'Gene'
containing gene symbols to the gc.gene.file.

This is most likely a user error due to invalid input data or
parameters (PureCN 1.7.56).

from purecn.

lima1 avatar lima1 commented on August 20, 2024

The runAbsoluteCN function currently does not support BED files. The IntervalFile.R script will convert it for you in the correct format.

For 1.7.56, I would recommend using the IntervalFile.R script exactly as described in the Quick vignette to generate this file. This is especially recommended if your data is from a hybrid-capture assay (Amplicon data is currently not officially supported), because then you get coverage from off-target regions as well.

You will need to re-calculate all coverages if you generate a new interval file.

from purecn.

rframal avatar rframal commented on August 20, 2024

Actually I am following the Quick Guide. The interval file that I used as an input in the "PureCN.R" script was created by the "IntervalFile.R" script. The command lines that I used are below:

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/IntervalFile.R --infile /home/rramalho/beds/3050941_Covered_V2.bed --fasta /home/rramalho/reference/Homo_sapiens.GRCh37.75.genome.fa --outfile 3050941_Covered_V2_gcgene.bed --offtarget --genome hg19 --export 3050941_Covered_V2_optimized.bed

And here is an example of the output interval file "3050941_Covered_V2_gcgene.bed" used as input the "PureCN.R"

Target gc_bias mappability Gene on_target
chr1:2104925-2296235 0.591617836925216 1 . FALSE
chr1:2296236-2487547 0.612125742243038 1 . FALSE
chr1:2488048-2488227 0.65 1 TNFRSF14 TRUE
chr1:2489099-2489338 0.6625 1 TNFRSF14 TRUE
chr1:2489725-2489964 0.641666666666667 1 TNFRSF14 TRUE
chr1:2491220-2491459 0.691666666666667 1 TNFRSF14 TRUE
chr1:2492018-2492197 0.65 1 TNFRSF14 TRUE
chr1:2492909-2493302 0.619289340101523 1 TNFRSF14 TRUE

Below is the command line that I used to run PureCN

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/PureCN.R --out . --tumor PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt --normal EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt --sampleid "HORIZON" --genome hg19 --gcgene 3050941_Covered_V2_gcgene.bed

INFO [2017-10-10 14:26:06] Loading PureCN...
INFO [2017-10-10 14:26:06] ------------------------------------------------------------
INFO [2017-10-10 14:26:06] PureCN 1.7.56
INFO [2017-10-10 14:26:06] ------------------------------------------------------------
INFO [2017-10-10 14:26:06] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file -normalDB -genome hg19 -sex ? -args.filterVcf NULL,c(0.03, 0.97),NULL -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-10 14:26:06] Loading coverage files...
INFO [2017-10-10 14:26:09] Mean target coverages: 1141X (tumor) 78X (normal).
WARN [2017-10-10 14:26:09] Allosome coverage missing, cannot determine sex.
WARN [2017-10-10 14:26:09] Allosome coverage missing, cannot determine sex.
INFO [2017-10-10 14:26:10] Removing 23 low/high GC targets.
INFO [2017-10-10 14:26:10] Removing 231 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-10 14:26:10] No normalDB provided. Provide one for better results.
INFO [2017-10-10 14:26:10] Removing 18085 low coverage (< 15.0000X) targets.
INFO [2017-10-10 14:26:10] Using 6634 intervals (6633 on-target, 1 off-target).
INFO [2017-10-10 14:26:10] AT/GC dropout: 1.07 (tumor), 0.68 (normal).
WARN [2017-10-10 14:26:10] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-10 14:26:10] Sample sex: ?
INFO [2017-10-10 14:26:10] Segmenting data...
INFO [2017-10-10 14:26:10] Setting undo.SD parameter to 1.250000.
Setting multi-figure configuration
INFO [2017-10-10 14:26:14] Mean standard deviation of log-ratios: 0.79
INFO [2017-10-10 14:26:14] 2D-grid search of purity and ploidy...
INFO [2017-10-10 14:26:22] Local optima: 0.48/6, 0.48/5.4, 0.52/5, 0.82/6, 0.55/4.6, 0.58/4.2
.
.
.
INFO [2017-10-10 14:26:35] Recalibrating log-ratios...
INFO [2017-10-10 14:26:35] Testing local optimum 15/15 at purity 0.52 and total ploidy 2.00...
INFO [2017-10-10 14:26:35] Done.
INFO [2017-10-10 14:26:35] ------------------------------------------------------------
Error in if (!is.na(rds$input$sex) && !is.na(rds$input$sex.vcf)) { :
valor ausente onde TRUE/FALSE necessário
Calls: createCurationFile -> data.frame -> .getSexFromRds
Além disso: Warning message:
In is.na(rds$input$sex.vcf) :
is.na() aplicado a um objeto diferente de lista ou vetor de tipo 'NULL'
Execução interrompida

And the results are still without gene calls so callAlterations() gives the following error:

HORIZON <- readRDS("HORIZON.rds")

gene.calls <- callAlterations(HORIZON)
FATAL [2017-10-10 14:30:56] This function requires gene-level calls. Please add a column 'Gene'

FATAL [2017-10-10 14:30:56] containing gene symbols to the gc.gene.file.

FATAL [2017-10-10 14:30:56]

FATAL [2017-10-10 14:30:56] This is most likely a user error due to invalid input data or

FATAL [2017-10-10 14:30:56] parameters (PureCN 1.7.56).

Erro: This function requires gene-level calls. Please add a column 'Gene'
containing gene symbols to the gc.gene.file.

This is most likely a user error due to invalid input data or
parameters (PureCN 1.7.56).

from purecn.

lima1 avatar lima1 commented on August 20, 2024

Thanks a lot, I can reproduce the crash in createCurationFile() and will submit a fix soon. It should disappear when you provide the MuTect VCF + stats file.

Are you sure you read in the correct "HORIZION.rds" file? Was it overwritten at around 14:26?

Also please make sure to create a normal database (at least 3-5 normals), provide it via normalDB. Otherwise PureCN won't use off-target or low-coverage regions in general.

I would also recommend downloading the mappability file from UCSC ( http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign100mer.bigWig) when you use off-target regions.

from purecn.

rframal avatar rframal commented on August 20, 2024

Okay, thanks!

Yes I am sure that the correct HORIZON.rds was loaded.

About the VCF: I have a Mutect2 VCF file (v4.2) for the tumor /normal samples and when I try to use it in PureCN I got the following issue:

Error in if (sum(colSums(geno(vcf)$DP) > 0) == 1 && args.filterVcf$use.somatic.status) { :

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/PureCN.R --out . --tumor PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt --normal EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt --vcf ../PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal._chrALL_sorted.vcf --sex F --sampleid "HORIZON" --genome hg19 --gcgene 3050941_Covered_V2_gcgene.bed

INFO [2017-10-10 15:49:45] Loading PureCN...
INFO [2017-10-10 15:49:45] ------------------------------------------------------------
INFO [2017-10-10 15:49:45] PureCN 1.7.56
INFO [2017-10-10 15:49:45] ------------------------------------------------------------
INFO [2017-10-10 15:49:45] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file ../PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal._chrALL_sorted.vcf -normalDB -genome hg19 -sex F -args.filterVcf NULL,c(0.03, 0.97),NULL -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-10 15:49:45] Loading coverage files...
INFO [2017-10-10 15:49:47] Mean target coverages: 1141X (tumor) 78X (normal).
INFO [2017-10-10 15:49:49] Removing 23 low/high GC targets.
INFO [2017-10-10 15:49:49] Removing 231 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-10 15:49:49] No normalDB provided. Provide one for better results.
INFO [2017-10-10 15:49:49] Removing 17682 low coverage (< 15.0000X) targets.
INFO [2017-10-10 15:49:49] Using 7037 intervals (7036 on-target, 1 off-target).
INFO [2017-10-10 15:49:49] AT/GC dropout: 1.07 (tumor), 0.68 (normal).
WARN [2017-10-10 15:49:49] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-10 15:49:49] Loading VCF...
Error in if (sum(colSums(geno(vcf)$DP) > 0) == 1 && args.filterVcf$use.somatic.status) { :

About the mappability file: When I had the following issue:

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/IntervalFile.R --infile /home/rramalho/beds/3050941_Covered_V2.bed --fasta /home/rramalho/reference/Homo_sapiens.GRCh37.75.genome.fa --outfile 3050941_Covered_V2_gcgene.bed --offtarget --genome hg19 --export 3050941_Covered_V2_optimized.bed --mappability wgEncodeCrgMapabilityAlign100mer.bigWig
WARNING: ignoring environment value of R_HOME
INFO [2017-10-10 16:04:56] Loading /home/rramalho/projects/CNV/PURECN/dev_version/wgEncodeCrgMapabilityAlign100mer.bigWig...
INFO [2017-10-10 16:05:07] Loading PureCN...
INFO [2017-10-10 16:05:07] Processing /home/rramalho/beds/3050941_Covered_V2.bed...
INFO [2017-10-10 16:05:10] Splitting 1282 large targets to an average width of 400.
Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequence chrM has incompatible seqlengths:

  • in 'x': 16569
  • in 'y': 16571
    Calls: calculateGCContentByInterval ... .Seqinfo.merge -> .Seqinfo.mergexy -> mergeNamedAtomicVectors

from purecn.

lima1 avatar lima1 commented on August 20, 2024

Mutect2 is not supported for matched Tumor normal because it’s not emitting Germline sites. You’ll need Mutect 1.1.7 for that. I’ll have a look at the other issues.

from purecn.

lima1 avatar lima1 commented on August 20, 2024

Unless your baits BED file contains chrM (which it shouldn't), chrM should be now ignored. I also fixed the createCurationFile crash.

So I would try to re-run IntervalFile.R with the mappability file, re-calculate the coverages, build a normal database and then run Mutect 1.1.7. It's available from https://software.broadinstitute.org/gatk/download/mutect and requires Java 1.7 (it will not work with Java 1.8).

Thanks again for your patience.

from purecn.

rframal avatar rframal commented on August 20, 2024

Okay thanks. Should I run BiocInstaller::biocLite("lima1/PureCN") again to get the fixed version ?

from purecn.

lima1 avatar lima1 commented on August 20, 2024

Yep

from purecn.

rframal avatar rframal commented on August 20, 2024

I followed your latest instructions, added the mappability and Mutect1 vcf files to the PureCN.R but now there is another error:

"Error in if (fractionContaminated > 0) { : "

Below is the full log

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/PureCN.R --out . --tumor PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt --normal EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt --sampleid "HORIZON" --genome hg19 --gcgene 3050941_Covered_V2_gcgene.bed --vcf HORIZONvsNIST.mutect1.vcf
WARNING: ignoring environment value of R_HOME
[1] "/home/rramalho/projects/CNV/PURECN/dev_version"
INFO [2017-10-11 16:16:02] Loading PureCN...
INFO [2017-10-11 16:16:02] ------------------------------------------------------------
INFO [2017-10-11 16:16:02] PureCN 1.7.57
INFO [2017-10-11 16:16:02] ------------------------------------------------------------
INFO [2017-10-11 16:16:02] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file HORIZONvsNIST.mutect1.vcf -normalDB -genome hg19 -sex ? -args.filterVcf NULL,c(0.03, 0.97),NULL -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-11 16:16:02] Loading coverage files...
INFO [2017-10-11 16:16:05] Mean target coverages: 1138X (tumor) 79X (normal).
WARN [2017-10-11 16:16:05] Allosome coverage missing, cannot determine sex.
WARN [2017-10-11 16:16:05] Allosome coverage missing, cannot determine sex.
INFO [2017-10-11 16:16:06] Removing 21 low/high GC targets.
INFO [2017-10-11 16:16:06] Removing 223 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-11 16:16:06] No normalDB provided. Provide one for better results.
INFO [2017-10-11 16:16:06] Removing 17795 low coverage (< 15.0000X) targets.
INFO [2017-10-11 16:16:06] Using 6613 intervals (6613 on-target, 0 off-target).
INFO [2017-10-11 16:16:06] No off-target intervals. If this is hybrid-capture data, consider adding them.
INFO [2017-10-11 16:16:06] AT/GC dropout: 1.07 (tumor), 0.67 (normal).
WARN [2017-10-11 16:16:06] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-11 16:16:06] Loading VCF...
INFO [2017-10-11 16:16:18] PT-SSV1-HORIZON is tumor in VCF file.
INFO [2017-10-11 16:16:22] 10747 homozygous and 5359 heterozygous variants on chrX.
INFO [2017-10-11 16:16:22] Sex from VCF: F (Fisher's p-value: < 0.0001, odds-ratio: 1.32).
INFO [2017-10-11 16:16:22] Found 633580 variants in VCF file.
INFO [2017-10-11 16:16:23] Removing 616563 non heterozygous (in matched normal) germline SNPs.
Error in if (fractionContaminated > 0) { :
valor ausente onde TRUE/FALSE necessário
Calls: runAbsoluteCN -> do.call -> -> filterVcfBasic
Execução interrompida

from purecn.

lima1 avatar lima1 commented on August 20, 2024

Man, I owe you a beer for finding all these crashes...

This error likely happens because all variants were filtered out. I added some checks for that. Can you, after another upgrade, confirm that you now get an error that none of the variants passed filtering?

Can you provide the command line call you used to run Mutect? The normal you provided is matched (from the same patient), correct? Mutect generates two files, the VCF and the stats file. Can you provide PureCN.R the latter one as well (with --statsfile)?

from purecn.

rframal avatar rframal commented on August 20, 2024

Actually, If PureCN works for me and get the gene annotations I owe you a 6-pack...

I did the update but the error still the same. Note that now I used the --statsfile parameter.

Rscript /opt/R-3.4.1/lib/R/library/PureCN/extdata/PureCN.R --out . --tumor PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt --normal EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt --sampleid "HORIZON" --genome hg19 --gcgene 3050941_Covered_V2_gcgene.bed --vcf HORIZONvsNIST.mutect1.vcf --statsfile HORIZONvsNIST.mutect1.stats.txt
WARNING: ignoring environment value of R_HOME
[1] "/home/rramalho/projects/CNV/PURECN/dev_version"
INFO [2017-10-13 09:27:13] Loading PureCN...
INFO [2017-10-13 09:27:13] ------------------------------------------------------------
INFO [2017-10-13 09:27:13] PureCN 1.7.58
INFO [2017-10-13 09:27:13] ------------------------------------------------------------
INFO [2017-10-13 09:27:13] Arguments: -normal.coverage.file EX-NIST-NA12878.sam.sorted.dedup.realigned.recal_coverage_loess.txt -tumor.coverage.file /home/rramalho/projects/CNV/PURECN/dev_version/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal_coverage_loess.txt -seg.file -vcf.file HORIZONvsNIST.mutect1.vcf -normalDB -genome hg19 -sex ? -args.filterVcf NULL,c(0.03, 0.97),HORIZONvsNIST.mutect1.stats.txt -args.setMappingBiasVcf NULL -args.segmentation NULL,0.005 -sampleid HORIZON -min.ploidy 1 -max.ploidy 6 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -gc.gene.file 3050941_Covered_V2_gcgene.bed -max.segments 300 -plot.cnv TRUE -model beta -post.optimize FALSE -log.file ./HORIZON.log -fun.segmentation -test.purity
INFO [2017-10-13 09:27:13] Loading coverage files...
INFO [2017-10-13 09:27:15] Mean target coverages: 1138X (tumor) 79X (normal).
WARN [2017-10-13 09:27:15] Large difference in coverage of tumor and normal.
WARN [2017-10-13 09:27:16] Allosome coverage missing, cannot determine sex.
WARN [2017-10-13 09:27:16] Allosome coverage missing, cannot determine sex.
INFO [2017-10-13 09:27:17] Removing 21 low/high GC targets.
INFO [2017-10-13 09:27:17] Removing 223 targets with low total coverage in normal (< 150.00 reads).
WARN [2017-10-13 09:27:17] No normalDB provided. Provide one for better results.
INFO [2017-10-13 09:27:17] Removing 17795 low coverage (< 15.0000X) targets.
INFO [2017-10-13 09:27:17] Using 6613 intervals (6613 on-target, 0 off-target).
INFO [2017-10-13 09:27:17] No off-target intervals. If this is hybrid-capture data, consider adding them.
INFO [2017-10-13 09:27:17] AT/GC dropout: 1.07 (tumor), 0.67 (normal).
WARN [2017-10-13 09:27:17] High GC-bias in normal or tumor. Is data GC-normalized?
INFO [2017-10-13 09:27:17] Loading VCF...
INFO [2017-10-13 09:27:28] PT-SSV1-HORIZON is tumor in VCF file.
INFO [2017-10-13 09:27:32] 10747 homozygous and 5359 heterozygous variants on chrX.
INFO [2017-10-13 09:27:32] Sex from VCF: F (Fisher's p-value: < 0.0001, odds-ratio: 1.32).
INFO [2017-10-13 09:27:32] Found 633580 variants in VCF file.
INFO [2017-10-13 09:27:51] Removing 403402 MuTect calls due to blacklisted failure reasons.
INFO [2017-10-13 09:27:51] Removing 220240 non heterozygous (in matched normal) germline SNPs.
Error in if (fractionContaminated > 0) { :
valor ausente onde TRUE/FALSE necessário
Calls: runAbsoluteCN -> do.call -> -> filterVcfBasic

Below is the Mutect1 command line that I used

/opt/jdk1.7.0_80/bin/java -jar /opt/mutect-1.1.7.jar --analysis_type MuTect -R /home/rramalho/reference/Homo_sapiens.GRCh37.75.genome.fa --dbsnp /home/rramalho/reference/dbSNP.All_20170710.vcf.gz -I:normal /home/rramalho/projects/CNV/PURECN/EX-NIST-NA12878.sam.sorted.dedup.realigned.recal.bam -I:tumor /home/rramalho/projects/CNV/PURECN/PT-SSV1-HORIZON.sam.sorted.dedup.realigned.recal.bam -o HORIZONvsNIST.mutect1.stats.txt -vcf HORIZONvsNIST.mutect1.vcf

No, the samples are not from the same patient.

from purecn.

lima1 avatar lima1 commented on August 20, 2024

MuTect uses the matched normal to distinguish somatic from germline. So the normal needs to be from the same patient. Just run in without the unmatched normal and you should be fine.

For coverage normalization, I would still recommend to get at least 3 normals sequenced pretty much exactly like tumors. Then build a normal databse with them. If you just use cell lines sequenced to low coverage, you'll get pretty noisy results.

from purecn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.