How to use result of CNVkit to estimate purity by PureCN
1
0
Entering edit mode
6.2 years ago

Hello,how to use CNVKit’s .cnn and .cnr files as input file for PureCN to estimate purity? Can you give an example? Thank you

sequencing • 3.0k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

Have a look at the vignette available at https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.pdf

There is an CNVkit example at the end of the vignette.

Make sure to use version 1.10.0 or current GitHub or Bioconductor devel.

It currently works best with Mutect 1.1.7. It's easy and fast to run (see the main vignette for examples).

ADD COMMENT
0
Entering edit mode

Hello, I want to use results from gatk pipeline, do you have any ideas how to do in PureCN? I see no example in PureCN overview and quickstart. Thanks.

ADD REPLY
0
Entering edit mode

It currently does not take all GATK4 output files out-of-the-box, but you can provide the segmentation and copy number log-ratios (see Section 10.1). PureCN can read GATK4 coverage files (in hdf5 format). Simply provide the tumor coverage and PureCN will be able to map provided log-ratios to the genomic coordinates (no need to generate and provide an interval.file).

Btw, PureCN implements the GATK4 coverage normalization with added support for sex chromosomes and off-target regions. There are differences in GC-normalization and segmentation though.

ADD REPLY
0
Entering edit mode

Thanks @markus.riester. I tried to run a small test but some error raised.
My code

library(PureCN)
normal_hdf5 <- "/MntWorkdir/GATK_CNV/P2_PBMC.hdf5"
tumor_hdf5 <- "/MntWorkdir/GATK_CNV/P2_hdf5/P2-3.recal.hdf5"
interval_file <- "/MntWorkdir/BGI_ex_region_hg19_preprocessed_intervals.interval_list"

ratio <- calculateLogRatio(readCoverageFile(normal_hdf5), readCoverageFile(tumor_hdf5))
retLogRatio <- runAbsoluteCN(log.ratio=ratio, genome="hg19", plot.cnv=FALSE, interval.file=interval_file)
pdf("TestPlot.pdf")
plotAbs(retLogRatio, 1, type="hist")

Error message

INFO [2019-06-12 06:45:30] Loading coverage files...
Error in utils::read.table(file, header = TRUE) : 
  more columns than column names
Calls: runAbsoluteCN ... readCoverageFile -> .readCoverageGatk3 -> <Anonymous>
Execution halted

My coverage file *.hdf5 comes from gatk CollectReadCounts . Error message says .readCoverageGatk3 does this command suits gatk4 hdf5 format?
My interval file is picard style from gatk PreprocessIntervals not from PureCN .

ADD REPLY
0
Entering edit mode

When you calculate the log-ratio like that, you are not using GATK4's denoising steps. GATK4 should generate a file with log2-ratio, you want to parse that file and provide the corresponding log2-ratio like you did.

The issue is likely the wrong format of the interval file. Just run it with:

retLogRatio <- runAbsoluteCN(tumor = tumor_hdf5, log.ratio=ratio, genome="hg19",  ....)

If you want to use GATK4's segmentation, provide it via seg.file (you might need to change the format for now, but I will add support for that soon, see the main vignette).

ADD REPLY
0
Entering edit mode

PureCN version 1.15.4 now added support for the GATK4 copy number workflow.

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6