Question

low sequencing read depth on Y chromosome

0

Entering edit mode

3.0 years ago

karen2 • 0

Hi everyone - I am working with Y chromosome data, and we have called our sequencing data with GATK HaplotypeCaller. I was not sure whether to use ploidy=1 or 2 in this case (we are working with male subjects only), so I did it both ways to compare the difference. In difficult regions, I noticed that ploidy=2, there are many more heterozygous calls. But, the main thing that I am concerned about is low depth.

In our diploid results, the median depth is around 14, but in the haploid results (ploidy=1), the median depth per site is only around 5. This seems quite low to me. Is this to be expected with haploid calls? I've been trying to find any documentaton about chrY haploid calling with GATK, but so far, no luck. Any advice/comments would be appreciated!

chromosome-Y GATK HaplotypeCaller • 904 views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 3.0 years ago by karen2 • 0

score 1 · Answer 1 · 2022-04-13

1

Entering edit mode

3.0 years ago

Istvan Albert 102k

Since the diploid chromosomes had 2 copies but the Y chromosome was present in 1 copy only, getting half the coverage is the expected behavior.

This is the "normal" result of the sequencing, we sample from a pool of short sequences that are evenly distributed over all DNA.

ADD COMMENT • link 3.0 years ago by Istvan Albert 102k

0

Entering edit mode

Thanks Istvan! It seems to me that the depth should be the same between diploid and haploid: reads mapped to Y are mapped to Y BEFORE we call the variants, whether we end up calling them as diploid or haploid. Indeed, when I run chrMT, the depth is identical between the two methods (diploid and haploid), as we expect. But, I will keep thinking about it. :)

ADD REPLY • link 3.0 years ago by karen2 • 0

0

Entering edit mode

I want to mention that the ploidy parameter of the variant caller instructs the tools to use a different statistical model. It has no relevance when it comes to coverage, depth, or read mapping.

A haploid chromosome produces half as many reads, so how could the depth be the same? The following picture may illuminate it, this is your material to begin with

   chr1 -----------------
   chr1 -----------------
   chr2 ------------
   chr2 ------------
   chrY -------

The sequencing process will evenly sample from each sequence. It should be evident that chrY ought to have half the coverage depth since it produces half as many reads per length.

When we are mapping the reads the reference genome has only one copy for chr1 and chr2 but in reality were two. Another way to say this is that the coverage of chr1 will look as if doubled.

ADD REPLY • link 3.0 years ago by Istvan Albert 102k