I have observed it lots of times, and can confirm now that the chip fragments never map to the first 3M bases of the chromosome start and also sometimes to last few hunderd thousand bases at the chromosome end.
Is it because of the centromere and telomere?? and these regions are not transcribed or are repeats.
Cheers
Heterochromatin, I would say.
Yeah great, but one also loosely refers to the "non/poorly-expressed" DNA as heterochromatin, which also occurs within the chromosome. Is it true, as the protein might be binding and regulating this poorly expressed locus, so it should show binding.
These kind of regions are represented as "NNNNNNNNNNNNNNNNNN" in reference fasta file.
But
N
would mean that there is no DNA present or it couldn't be sequenced. Do you know, if they deliberately added the N's, so nothing could get mapped(like a notation).Sukhdeep, there is DNA present but the core of the centromere is composed of arrays of simple repeats which are difficult to sequence and virtually impossible to assemble. The length of these regions has been determined by cytogenetics because the core repeats are known for many species. The pericentromere is composed of more complex repeats, like nested retrotransposons (along with some coding genes as well), but this is still an incredibly problematic area of the genome to reconstruct given the similarity of the repeat regions. This is especially true in mouse since retrotransposons have been much more active in this region of the genome than in humans (though they both pale in comparison to the situation in plants). So, what you typically have for most species are assemblies where these regions are not represented at all.
Thanks for the information :)
"Couldn't be sequenced" or couldn't be mapped/assembled?
Yeah in context with genome mappings, it should be mapped/assembled, I was referring to naive base calling via sequencing.