I have some genome sequencing data from durum wheat. I'm using GATK to call variants.
I got the genome sequence here: http://plants.ensembl.org/Triticum_turgidum/Info/Index Here is the reference length info:
1A 585266722
1B 681112512
2A 775448786
2B 790338525
3A 746673839
3B 836514780
4A 736872137
4B 676292951
5A 669155517
5B 701372996
6A 615672275
6B 698614761
7A 728031845
7B 722970987
Un 498719471
What number should I give here for -ploidy
while running 'gatk HaplotypeCaller'.
I think I should use 2
. Although durum wheat (BBAA) has a tetraploid genome. But we have chromosome info for each of subgenomes. So for each position on the genome, we expect 2 alleles for SNPs.
If you have any comments, please let me know. Thanks in advance.
Similar question was also posted here by another researcher: https://gatk.broadinstitute.org/hc/en-us/community/posts/360074701212-How-do-I-choose-GenotypeGVCFs-parameters-to-best-fit-data-from-a-selfing-species-with-a-highly-duplicated-genome-