Question

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy --contig-ploidy-priors option ?

0

Entering edit mode

4.4 years ago

mikizu • 0

Hi !

I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C).

In order to perform CNV calling with the GATK pipeline "(How to) Call common and rare germline copy number variants", in the third step the tool "DetermineGermlineContigPloidy [BETA]" has to be used and a "contig ploidy priors" table file is requested for the option --contig-ploidy-priors. However, after having searched for some answers to my question (the answer may be obvious ? This is the first time I am doing this), I still do not know how to create or generate this file.

Here is the kind of table that I should use :

CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3

1 0.01 0.01 0.97 0.01

2 0.01 0.01 0.97 0.01

X 0.01 0.49 0.49 0.01

Y 0.50 0.50 0.00 0.00

Here are some GATK topics about the --contig-ploidy-priors I have already consulted :

How do you generate the file required for the --contig-ploidy-priors parameter : It was partially answered but only for human studies
Germline CNV, ploidy and best practices : Same here, it is said that it can be "easily" by using CollectFragmentCounts beforehand (which seems to be now CollectReadCounts) but in the end it is human data so it is suggested to use the default file provided by GATK with some minor changer. However I still don't know how to do it by myself for my yeast sample.

Does anyone know or have an idea about how to generate/create this contig ploidy priors table ? Do I have to create a random table and put the numbers I think are good thanks to a ploidy detection that I should perform before ? Do you think that I should just use CNVnator, CNVkit or any other tool for only "tumor"/one sample CNV calling ?

Thank you in advance, any help would be appreciated.

variant-calling cnv gatk gCNV • 3.8k views

ADD COMMENT • link updated 3.0 years ago by kanika.151 ▴ 160 • written 4.4 years ago by mikizu • 0

0

Entering edit mode

Did you get any solution? How to make this table?

ADD REPLY • link 4.1 years ago by xoaib • 0

score 0 · Answer 1 · 2021-11-17

0

Entering edit mode

3.0 years ago

Dr N Ch • 0

Does anyone know or have an idea about how to generate/create this contig ploidy priors table ?

ADD COMMENT • link 3.0 years ago by Dr N Ch • 0

score 0 · Answer 2 · 2021-11-17

The probabilities in this file should reflect your prior belief for the copy-number state of each contig, given the prevalence of aneuploidies and sex genotypes in the population. For example, the table used in the tutorial indicates that we believe there is a small chance for the copy-number of chr20 to be either 1 or 3, but it is most likely 2.

We use these priors in conjunction with the likelihood of our observed data (i.e., the total read count per contig) to determine the posterior probability of the per-contig copy number in the usual Bayesian manner. As always, high quality data (which is well explained by the likelihood model) will weaken the influence of the prior on the final result. However, if your data quality is low, you may want to impose stronger priors to regularize away the possibility of getting spurious results (e.g., unrealistic sex genotypes).

Ideally, you would run the tool on a “training” set of samples where the truth is known, tuning the priors or other parameters to recover the correct result if necessary. Once this tuning procedure is complete, you can proceed to use the same priors and parameters on subsequent samples. However, if PARs and other problematic regions are appropriately masked (as mentioned in the tutorial), usually the results of this tool are reasonable without any tuning required.

from: found this answer by slee here