Question

No Reads Ever Map To First 3Million Bases Of Chromosomes In Mouse Genome! Why?

2

Entering edit mode

12.5 years ago

Sukhi Singh 11k

I have observed it lots of times, and can confirm now that the chip fragments never map to the first 3M bases of the chromosome start and also sometimes to last few hunderd thousand bases at the chromosome end.

Is it because of the centromere and telomere?? and these regions are not transcribed or are repeats.

Cheers

chipseq ngs mapping rna-seq • 7.9k views

ADD COMMENT • link updated 12.0 years ago by Biostar 20 • written 12.5 years ago by Sukhi Singh 11k

4

Entering edit mode

Heterochromatin, I would say.

ADD REPLY • link 12.5 years ago by fo3c ▴ 450

0

Entering edit mode

Yeah great, but one also loosely refers to the "non/poorly-expressed" DNA as heterochromatin, which also occurs within the chromosome. Is it true, as the protein might be binding and regulating this poorly expressed locus, so it should show binding.

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

2

Entering edit mode

These kind of regions are represented as "NNNNNNNNNNNNNNNNNN" in reference fasta file.

ADD REPLY • link 12.5 years ago by Ashutosh Pandey 12k

0

Entering edit mode

But N would mean that there is no DNA present or it couldn't be sequenced. Do you know, if they deliberately added the N's, so nothing could get mapped(like a notation).

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

5

Entering edit mode

Sukhdeep, there is DNA present but the core of the centromere is composed of arrays of simple repeats which are difficult to sequence and virtually impossible to assemble. The length of these regions has been determined by cytogenetics because the core repeats are known for many species. The pericentromere is composed of more complex repeats, like nested retrotransposons (along with some coding genes as well), but this is still an incredibly problematic area of the genome to reconstruct given the similarity of the repeat regions. This is especially true in mouse since retrotransposons have been much more active in this region of the genome than in humans (though they both pale in comparison to the situation in plants). So, what you typically have for most species are assemblies where these regions are not represented at all.

ADD REPLY • link 12.5 years ago by SES 8.6k

0

Entering edit mode

Thanks for the information :)

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

2

Entering edit mode

"Couldn't be sequenced" or couldn't be mapped/assembled?

ADD REPLY • link 12.5 years ago by PoGibas 5.1k

0

Entering edit mode

Yeah in context with genome mappings, it should be mapped/assembled, I was referring to naive base calling via sequencing.

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

score 9 · Answer 1 · 2013-03-01

The Genome Reference Consortium (http://genomereference.org) attempts to model biological gaps in the assemblies that we produce. Unfortunately, in the current assemblies, the models for both centromeres and telomeres are rather poor so they just consist of a run of Ns. We don't have good estimates of mouse telomere/centromere size, so we use a default of 3M Ns for these regions. This information is marked up in the AGP files that define the assembly: mouse: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Mus_musculus/GRCm38.p1/ human: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p11/

Note: even within the euchromatic regions there can be long runs of Ns representing gaps that we can't fill yet. In many cases we do have a good size estimate for the gap- typically based on experimental evidence like comparison to an optical map. For human, the problem is that some of the euchromatic gaps are polymorphic, so the size of the gap really depends on the individual you are assessing.

hope that helps.

Casey Bergman · Answer 2 · 2013-02-27

6

Entering edit mode

12.5 years ago

Sukhi Singh 11k

I got my answer. Below is the graphic ideograms of Mouse karyotypes from Ensembl. So, the start of each chromosome in UCSC is the centromere, which can span to first ~3M bases. There are no genes in the region, the second screenshot of Chr2 in Mouse. I've checked a couple of others as well.

So, if anything binding there, might be noise. Centromeres and Telomeres are contituting a lot of repetitive regions as well, which I generally remove, thus no mapping observed. Can someone comment on how can we pull this information from the databases (UCSC), how much region in spanned to Centrosome/Telomere and contains no genes, one useful case would be on how to modify the chromosome co-cordinate file, so as to replace the start 0 with position where centromere ends. This file has a usecae with the BEDOPS-based binning script to calculate the coverage, thus will save little time and resources.

enter image description here

ADD COMMENT • link updated 12.5 years ago by Casey Bergman 18k • written 12.5 years ago by Sukhi Singh 11k

0

Entering edit mode

I guess this is applicable not only to the mouse genome. Would like to hear comments on the other genomes as well.

ADD REPLY • link 12.5 years ago by PoGibas 5.1k

0

Entering edit mode

As SES said in the previous comment, they are available in all genomes, with different variability among plants and animals, so I presume, it would be the same scenario, though the length and position might differ.

ADD REPLY • link 12.5 years ago by Sukhi Singh 11k

score 3 · Answer 3 · 2013-02-27

3

Entering edit mode

12.5 years ago

Jeremy Leipzig 23k

Mice have telocentric chromosomes. I'm not sure why.

ADD COMMENT • link 12.5 years ago by Jeremy Leipzig 23k

0

Entering edit mode

I think the Y is acrocentric but the X and autosomes are telocentric. As to why, I don't think there is any explanation for why all the chromosomes show the same pattern despite the fact that people have been studying this for about one hundred years (though I'd like to find out I was wrong on that).

ADD REPLY • link 12.5 years ago by SES 8.6k