Hi ,
I am going to try to describe the problem as i am not sure if and/or how to attach the input files required.
What i am trying to do is to see how PureCN works. To that end i have taken 3 normal samples, mapped them to hg38 (bwa) extracted the reads mapping to chr20 and chrX, sub-selected roughly 800 exons (oh yea this is are simulated WES example i am using to test various things i am playing with ) the crated PON out of those 3 samples (bcbio-nextgen pipeline : followed the tutorial). Next i simulated one tumor sample (intorduces cnvs snps, etc.) mapped it using the same set of parameters and for the normals (bwa, bet, everything the same). In case of normals i called mutations using FreeBayes and for somatic ones in tumor i used Mutect2 (followed all best practices). Once all data was collected i tried to utilize PureCN and reached the final stage with no objections from any of the stages:
Rscript ${PC}/PureCN.R --out purecn --tumor tumor00-ST_coverage_loess.txt.gz --sampleid tumor00-ST --vcf tumor00-ST-batch-effects.vcf.gz --normaldb normalDB_hg38.rds --mappingbiasfile mapping_bias_hg38.rds --intervals PON.multicov.txt --snpblacklist simple_repeat.bed --genome hg38 --force --postoptimize --seed 67 --bootstrapn 10 --cores 8
output:
INFO [2021-12-15 23:56:42] Loading PureCN 2.1.5...
WARN [2021-12-15 23:56:44] Recommended to provide --funsegmentation PSCBS.
INFO [2021-12-15 23:56:44] Using BiocParallel MulticoreParam backend with 8 cores.
INFO [2021-12-15 23:56:44] ------------------------------------------------------------
INFO [2021-12-15 23:56:44] PureCN 2.1.5
INFO [2021-12-15 23:56:44] ------------------------------------------------------------
INFO [2021-12-15 23:56:44] Arguments: -tumor.coverage.file tumor00-ST_coverage_loess.txt.gz -log.ratio -seg.file -vcf.file tumor00-ST-batch-effects.vcf.gz -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf mapping_bias_hg38.rds -args.segmentation 0.005,NULL -sampleid tumor00-ST -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file PON.multicov.txt -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -DB.info.flag DB -POPAF.info.field POP_AF -model beta -post.optimize TRUE -log.file tumor00-ST.log -normal.coverage.file <data> -normalDB <data> -args.filterVcf <data> -fun.segmentation <data> -test.num.copy <data> -test.purity <data> -speedup.heuristics <data> -BPPARAM <data>
INFO [2021-12-15 23:56:44] Using BiocParallel for parallel optimization.
INFO [2021-12-15 23:56:44] Loading coverage files...
INFO [2021-12-15 23:56:44] Mean target coverages: 49X (tumor) 49X (normal).
WARN [2021-12-15 23:56:44] Allosome coverage missing, cannot determine sex.
WARN [2021-12-15 23:56:44] Allosome coverage missing, cannot determine sex.
INFO [2021-12-15 23:56:45] Removing 77 intervals with missing log.ratio.
INFO [2021-12-15 23:56:45] Removing 1 low/high GC targets.
INFO [2021-12-15 23:56:45] Removing 1684 intervals excluded in normalDB.
INFO [2021-12-15 23:56:45] Removing 2 intervals with low total coverage in normal (< 150.00 reads).
INFO [2021-12-15 23:56:45] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2021-12-15 23:56:45] Removing 9 low count (< 100 total reads) intervals.
INFO [2021-12-15 23:56:45] Removing 615 low coverage (< 0.0015X) intervals.
INFO [2021-12-15 23:56:45] Using 395 intervals (188 on-target, 207 off-target).
INFO [2021-12-15 23:56:45] Ratio of mean on-target vs. off-target read counts: 0.06
INFO [2021-12-15 23:56:45] Mean off-target bin size: 118201
INFO [2021-12-15 23:56:45] AT/GC dropout: 0.97 (tumor), 0.96 (normal), 1.02 (coverage log-ratio).
INFO [2021-12-15 23:56:45] Loading VCF...
INFO [2021-12-15 23:56:46] Found 225 variants in VCF file.
INFO [2021-12-15 23:56:46] Removing 2 triallelic sites.
INFO [2021-12-15 23:56:46] Maximum of POPAP INFO is > 1, assuming -log10 scaled values
WARN [2021-12-15 23:56:46] vcf.file has no DB info field for membership in germline databases. Found and used valid population allele frequency > 0.001000 instead.
INFO [2021-12-15 23:56:46] 171 (76.7%) variants annotated as likely germline (DB INFO flag).
WARN [2021-12-15 23:56:46] Did not find base quality scores, will use global error rate of 0.0010 instead.
INFO [2021-12-15 23:56:46] tumor00-ST is tumor in VCF file.
INFO [2021-12-15 23:56:46] 33 homozygous and 63 heterozygous variants on chrX.
INFO [2021-12-15 23:56:46] Sex from VCF: F (Fisher's p-value: 0.879, odds-ratio: 0.92).
INFO [2021-12-15 23:56:46] Detected MuTect2 VCF.
INFO [2021-12-15 23:56:46] Removing 44 Mutect2 calls due to blacklisted failure reasons.
INFO [2021-12-15 23:56:46] Global base quality score of 29
INFO [2021-12-15 23:56:46] Minimum number of supporting reads ranges from 3 to 4, depending on coverage and BQS.
INFO [2021-12-15 23:56:47] Removing 64 variants with AF < 0.030 or AF >= 0.970 or insufficient supporting reads or depth < 15.
INFO [2021-12-15 23:56:51] Removing 6 blacklisted variants.
INFO [2021-12-15 23:56:51] Removing 0 low quality variants with non-offset BQ < 25.
INFO [2021-12-15 23:56:51] Total size of targeted genomic region: 0.05Mb (0.06Mb with 50bp padding).
INFO [2021-12-15 23:56:52] 6.9% of targets contain variants.
INFO [2021-12-15 23:56:52] Removing 94 variants outside intervals.
INFO [2021-12-15 23:56:52] Setting somatic prior probabilities for dbSNP hits to 0.000500 or to 0.500000 otherwise.
INFO [2021-12-15 23:56:52] Loading mapping bias file mapping_bias_hg38.rds...
INFO [2021-12-15 23:56:52] Found 97 variants in mapping bias file.
INFO [2021-12-15 23:56:52] Imputing mapping bias for 14 variants...
INFO [2021-12-15 23:56:52] Excluding 0 novel or poor quality variants from segmentation.
INFO [2021-12-15 23:56:52] Sample sex: ?
INFO [2021-12-15 23:56:52] Segmenting data...
INFO [2021-12-15 23:56:52] Interval weights found, will use weighted CBS.
INFO [2021-12-15 23:56:52] Loading pre-computed boundaries for DNAcopy...
INFO [2021-12-15 23:56:52] Setting undo.SD parameter to 0.500000.
Setting multi-figure configuration
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
[1] "seg-pval:NA: seg$pval < max.pval:1e-05"
[1] "seg-pval:: seg$pval < max.pval:1e-05"
INFO [2021-12-15 23:56:53] Setting prune.hclust.h parameter to 0.100000.
Error in hclust(dist(dx), method = method) :
must have n >= 2 objects to cluster
Calls: runAbsoluteCN -> do.call -> <Anonymous> -> .pruneByHclust -> hclust
In addition: Warning message:
In .bcfHeaderAsSimpleList(header) :
duplicate keys in header will be forced to unique rownames
Execution halted
`
I tried to fiddle with the code to see what i am missing (it seams seg-pval are either NA or missing) but at this point i am done messing around and hoping someone could direct me to a small (really small) working example that i could just plug into the above cmd and see that is works . I assume the problem is associated to my inputs but i would need days to figure it out.
Any readily available small example?
Sincerely Thank you for any help u might provide on this issue!