Hi everyone - I am working with Y chromosome data, and we have called our sequencing data with GATK HaplotypeCaller. I was not sure whether to use ploidy=1 or 2 in this case (we are working with male subjects only), so I did it both ways to compare the difference. In difficult regions, I noticed that ploidy=2, there are many more heterozygous calls. But, the main thing that I am concerned about is low depth.
In our diploid results, the median depth is around 14, but in the haploid results (ploidy=1), the median depth per site is only around 5. This seems quite low to me. Is this to be expected with haploid calls? I've been trying to find any documentaton about chrY haploid calling with GATK, but so far, no luck. Any advice/comments would be appreciated!
Thanks Istvan! It seems to me that the depth should be the same between diploid and haploid: reads mapped to Y are mapped to Y BEFORE we call the variants, whether we end up calling them as diploid or haploid. Indeed, when I run chrMT, the depth is identical between the two methods (diploid and haploid), as we expect. But, I will keep thinking about it. :)
I want to mention that the ploidy parameter of the variant caller instructs the tools to use a different statistical model. It has no relevance when it comes to coverage, depth, or read mapping.
A haploid chromosome produces half as many reads, so how could the depth be the same? The following picture may illuminate it, this is your material to begin with
The sequencing process will evenly sample from each sequence. It should be evident that
chrY
ought to have half the coverage depth since it produces half as many reads per length.When we are mapping the reads the reference genome has only one copy for
chr1
andchr2
but in reality were two. Another way to say this is that the coverage ofchr1
will look as if doubled.