Hi,
The exome sequencing output typically contains some fraction (typically 10–50%) of sequence that is off-target. I was wondering how I may check if the off-target sequences in a given exome-sequencing data (BAM format) contains telomere sequences or not (For instance, by checking repeats of TTAGGG hexamers).
Have you tried making a longish sequence of the hexamers and aligning to that? Presumably that'd work. Alternatively, I wouldn't be surprised if the kmer part that FastQC does would show this.
I would like someone to confirm this, but I think telomeres are masked in genome files and as such no reads will map to these in your bam file. However, there might be unmapped reads in your bam file corresponding to the telomeric sequences.
Yup, they're almost always hardmasked. This is true of most regions of constitutive heterochromatin.
Thank you WouterDeCoster and Devon Ryan.