Question

Right way to call purecn with cnvkit?

0

Entering edit mode

2.6 years ago

ww22runner ▴ 60

Hi all,

I would like to get some input on whether I am running PureCN with CNVkit's outputs correctly. The reason I ask is because when I use 2 different sets of 10 process matched germlines, I am getting wildly different results for paired tumors' cellularity and ploidy results.

Here is what I am doing:

CNVKIT

python $CNVKIT/cnvkit.py batch $CNV_BAMS/_T.bam --normal $CNV_BAMS/_G.bam --targets $BEDFILE_0B --fasta $REF_GENOME_b37 --access $CNVKIT/data/access-5k-mappable.grch37.bed --output-reference $CNV_BAMS/my_reference.cnn --output-dir $CNV_T_RESULTS --diagram --scatter -p 8

python $CNVKIT/cnvkit.py batch $CNV_BAMS/*_G.bam -r $CNV_BAMS/my_reference.cnn -p 8 --scatter --diagram -d $CNV_G_RESULTS

PURECN WORKFLOW

1) Run Mutect2 for first 10 process matched germline sample

$GATK_PATH Mutect2 \ -R $REF_GENOME_b37 \ -I $CNV_BAMS/${GERMLINE_PFX}_G.bam \ --max-mnp-distance 0 \ -O $CNV_G_RESULTS_PURECN/${GERMLINE_PFX}.vcf.gz

2) Create a Genomics DB from the normal Mutect2 calls

$GATK_PATH GenomicsDBImport \ -R $REF_GENOME_b37 \ -L $BEDFILE_0B \ --genomicsdb-workspace-path $CNV_G_RESULTS_PURECN/pon_db \ --sample-name-map $CNV_G_RESULTS_PURECN/G_sample_map

3) Combine the normal calls using CreateSomaticPanelOfNormals

$GATK_PATH CreateSomaticPanelOfNormals \ -R $REF_GENOME_b37 \ --germline-resource $GERMLINE_RESOURCE \ -V gendb://$CNV_G_RESULTS_PURECN/pon_db \ -O $CNV_G_RESULTS_PURECN/pon.vcf.gz

4) For each tumor sample in run:

Create .seg file

python $CNVKIT/cnvkit.py export seg $CNV_T_RESULTS/${TUMOR_PFX}_T.cns --enumerate-chroms -o $CNV_T_RESULTS_PURECN/${TUMOR_PFX}.seg

Run Mutect2

$GATK_PATH Mutect2 \ -R $REF_GENOME_b37 \ -I $CNV_BAMS/${TUMOR_PFX}_T.bam \ -pon $CNV_G_RESULTS_PURECN/pon.vcf.gz \ --germline-resource $GERMLINE_RESOURCE\ --af-of-alleles-not-in-resource 0.0000025 \ --genotype-germline-sites true \ -L $BEDFILE_0B \ -O $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_unmatched.vcf.g

Filter Mutect2 vcf:

$GATK_PATH FilterMutectCalls \ -R $REF_GENOME_b37 \ -V $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_unmatched.vcf.gz \ -O $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_filtered.vcf.gz

Run PureCN by providing the .cnr and .seg files:

$RSCRIPT_PATH $PURECN/PureCN.R \ --out $CNV_T_RESULTS_PURECN/${TUMOR_PFX} \ --sampleid ${TUMOR_PFX}\ --tumor $CNV_T_RESULTS/${TUMOR_PFX}_T.cnr \ --segfile $CNV_T_RESULTS_PURECN/${TUMOR_PFX}.seg \ --vcf $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_filtered.vcf.gz \ --genome hg19 \ --force --postoptimize --seed 123 \ --funsegmentation Hclust

Versions I am using- cnvkit 0.9.5, pureCN 1.21.0, Python 2.7, R3.5.1, Mutect2 from GATK 4.1.9.0 When I use different sets of 10 germline samples from my run to test the tumor cellularity and ploidy, the cellularities I get vary more than 20%. Is there anything incorrect about the workflow?

Thank you

PureCN • 1.3k views

ADD COMMENT • link updated 2.6 years ago by markus.riester ▴ 550 • written 2.6 years ago by ww22runner ▴ 60

0

Entering edit mode

Can you add the PureCN log file? The paired and unpaired should return the same purity and ploidy in vast majority of cases.

ADD REPLY • link 2.6 years ago by markus.riester ▴ 550

0

Entering edit mode

Hi Dr.Riester, What I meant was that I had 12 pairs of matched tumor-germline samples in a run and for the first try, I used the first 10 germline samples to form a pooled reference while in the second time, I used the last 10 germline samples to form a pooled reference. The cellularities and ploidies for the tumors in this run were not consistent for both runs and I was wondering if my workflow/commands were incorrect.

ADD REPLY • link 2.6 years ago by ww22runner ▴ 60

0

Entering edit mode

So only a difference of 2 normals? Either way, there shouldn’t be a huge variance in purity. I would recommend to generate a mapping bias database using the GenomicsDBImport output. The Docker image has the dependencies installed, see the best practices vignette. But there are likely more issues, I would need the log files for that.

ADD REPLY • link 2.6 years ago by markus.riester ▴ 550