Hello all,
I have a question about how to pass cnvkit's output to pureCN to account for the tumor cellularity,
According to https://bioconductor.org/packages/release/bioc/vignettes/PureCN/inst/doc/Quick.pdf, Page 6
I first do:
Rscript $PURECN/NormalDB.R --outdir $OUT_REF --normal_panel $NORMAL_PANEL \
--assay agilent_v6 --genome hg19 --force
When I run this, I am asked for --coveragefiles. Can I provide a file from cnvkit for this?
> cnvkit.py export seg $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cns
> --enumerate-chroms \
> -o $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg
And finally:
> Rscript $PURECN/PureCN.R --out $OUT/$SAMPLEID \
> --sampleid $SAMPLEID \
> --tumor $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.cnr \
> --segfile $OUT/$SAMPLEID/${SAMPLEID}_cnvkit.seg \
> --normal_panel $OUT_REF/mapping_bias_agilent_v6_hg19.rds \
> --vcf ${SAMPLEID}_mutect.vcf \
> --statsfile ${SAMPLEID}_mutect_stats.txt \
> --snpblacklist hg19_simpleRepeats.bed \
> --genome hg19 \
> --funsegmentation none \
> --force --postoptimize --seed 123
Here, could someone please advise me on what snpblacklist is and what is being used as --normal_panel?
I tried reading the manuals but I am still confused.
Any suggestions would be appreciated, thanks!
Hi Markus,
Thank you for your detailed response. I would like to clarify some things: 1) For the first command, you mentioned that PureCN accepts CNVkit coverage files. Does this mean a .cnr or .cns file from cnvkit? (Sorry for the basic question but I wasn't exactly sure what a coverage file was here) What output would this command produce and where would it be next used? 2) For the last command, what I understand is that for --tumor, we need to provide a vcf for tumor from mutect and if the tumor was matched with the normal, then we do not have to specify --normal_panel. For the tumor vcf, do I run the tumor against a panel of normal that I already had created or just against its own matched normal sample? Is --statsfile optional? Is --segfile an optional argument? Thank you once again for your time!
1) The .cnr files 2) You run Mutect as you normally would, i.e. provide the normal BAM file and you can also provide --normal_panel for artifact flagging. --statfile is optional, but since Mutect automatically generates it it's easy to add. When provided, PureCN can remove artifacts based on the flags in the statsfile (unfortunately Mutect1 does not add those in the VCF). So if you do the artifact filtering yourself (keep the germline SNPs though!), you can skip --statfile.
If you don't provide --segfile, PureCN should segment the coverage log-ratio (in the .cnr file). So if you want to use the CNVkit segmentation, provide --segfile.