Right way to call purecn with cnvkit?
0
0
Entering edit mode
2.6 years ago
ww22runner ▴ 60

Hi all,

I would like to get some input on whether I am running PureCN with CNVkit's outputs correctly. The reason I ask is because when I use 2 different sets of 10 process matched germlines, I am getting wildly different results for paired tumors' cellularity and ploidy results.

Here is what I am doing:

CNVKIT

python $CNVKIT/cnvkit.py batch $CNV_BAMS/_T.bam --normal $CNV_BAMS/_G.bam --targets $BEDFILE_0B --fasta $REF_GENOME_b37 --access $CNVKIT/data/access-5k-mappable.grch37.bed --output-reference $CNV_BAMS/my_reference.cnn --output-dir $CNV_T_RESULTS --diagram --scatter -p 8

python $CNVKIT/cnvkit.py batch $CNV_BAMS/*_G.bam -r $CNV_BAMS/my_reference.cnn -p 8 --scatter --diagram -d $CNV_G_RESULTS

PURECN WORKFLOW

1) Run Mutect2 for first 10 process matched germline sample

$GATK_PATH Mutect2 \ -R $REF_GENOME_b37 \ -I $CNV_BAMS/${GERMLINE_PFX}_G.bam \ --max-mnp-distance 0 \ -O $CNV_G_RESULTS_PURECN/${GERMLINE_PFX}.vcf.gz

2) Create a Genomics DB from the normal Mutect2 calls

$GATK_PATH GenomicsDBImport \ -R $REF_GENOME_b37 \ -L $BEDFILE_0B \ --genomicsdb-workspace-path $CNV_G_RESULTS_PURECN/pon_db \ --sample-name-map $CNV_G_RESULTS_PURECN/G_sample_map

3) Combine the normal calls using CreateSomaticPanelOfNormals

$GATK_PATH CreateSomaticPanelOfNormals \ -R $REF_GENOME_b37 \ --germline-resource $GERMLINE_RESOURCE \ -V gendb://$CNV_G_RESULTS_PURECN/pon_db \ -O $CNV_G_RESULTS_PURECN/pon.vcf.gz

4) For each tumor sample in run:

Create .seg file

python $CNVKIT/cnvkit.py export seg $CNV_T_RESULTS/${TUMOR_PFX}_T.cns --enumerate-chroms -o $CNV_T_RESULTS_PURECN/${TUMOR_PFX}.seg

Run Mutect2

$GATK_PATH Mutect2 \ -R $REF_GENOME_b37 \ -I $CNV_BAMS/${TUMOR_PFX}_T.bam \ -pon $CNV_G_RESULTS_PURECN/pon.vcf.gz \ --germline-resource $GERMLINE_RESOURCE\ --af-of-alleles-not-in-resource 0.0000025 \ --genotype-germline-sites true \ -L $BEDFILE_0B \ -O $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_unmatched.vcf.g

Filter Mutect2 vcf:

$GATK_PATH FilterMutectCalls \ -R $REF_GENOME_b37 \ -V $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_unmatched.vcf.gz \ -O $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_filtered.vcf.gz

Run PureCN by providing the .cnr and .seg files:

$RSCRIPT_PATH $PURECN/PureCN.R \ --out $CNV_T_RESULTS_PURECN/${TUMOR_PFX} \ --sampleid ${TUMOR_PFX}\ --tumor $CNV_T_RESULTS/${TUMOR_PFX}_T.cnr \ --segfile $CNV_T_RESULTS_PURECN/${TUMOR_PFX}.seg \ --vcf $CNV_T_RESULTS_PURECN/${TUMOR_PFX}_filtered.vcf.gz \ --genome hg19 \ --force --postoptimize --seed 123 \ --funsegmentation Hclust

Versions I am using- cnvkit 0.9.5, pureCN 1.21.0, Python 2.7, R3.5.1, Mutect2 from GATK 4.1.9.0 When I use different sets of 10 germline samples from my run to test the tumor cellularity and ploidy, the cellularities I get vary more than 20%. Is there anything incorrect about the workflow?

Thank you

PureCN • 1.3k views
ADD COMMENT
0
Entering edit mode

Can you add the PureCN log file? The paired and unpaired should return the same purity and ploidy in vast majority of cases.

ADD REPLY
0
Entering edit mode

Hi Dr.Riester, What I meant was that I had 12 pairs of matched tumor-germline samples in a run and for the first try, I used the first 10 germline samples to form a pooled reference while in the second time, I used the last 10 germline samples to form a pooled reference. The cellularities and ploidies for the tumors in this run were not consistent for both runs and I was wondering if my workflow/commands were incorrect.

ADD REPLY
0
Entering edit mode

So only a difference of 2 normals? Either way, there shouldn’t be a huge variance in purity. I would recommend to generate a mapping bias database using the GenomicsDBImport output. The Docker image has the dependencies installed, see the best practices vignette. But there are likely more issues, I would need the log files for that.

ADD REPLY

Login before adding your answer.

Traffic: 2977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6