mouse chrY centromere location
1
0
Entering edit mode
4.4 years ago
igor 13k

Mouse centromeres have been previously discussed here (for example: Ucsc Mm10 Mouse Gap Table Has The Same Centromere Coordinates ). The centromeres are at the ends of the chromosome and the UCSC gap table lists them at 110000-3000000 for every chromosome. However, there is an exception. That same table does not have a centromere for chrY (it has the short arm as 100000-110000, same as the other chromosomes).

Apparently, chrY is an exception. From Soh et al:

We obtained the complete sequence of the mouse Y centromere. Consisting of 90 kb of satellite repeats ... It is located between 3.5 Mb of short-arm and 86.0 Mb of long-arm sequence, confirming that the mouse Y is the only acrocentric chromosome among all the other telocentric mouse chromosomes.

David Adler (University of Washington) generated some idiograms (in 1994?) where you can see chrY is a bit unusual in this regard:

idiograms

For automatically retrieving chrY centromere, is there some reference file with the proper coordinates?

genome reference • 2.5k views
ADD COMMENT
2
Entering edit mode
4.4 years ago
igor 13k

I contacted UCSC and was forwarded to the NCBI Genome Reference Consortium, since the gap files are based on AGP files from GRC. This is the official response in case anyone is curious:

Dr. Page and his lab have long served as collaborators of the GRC. As a consequence, we were made aware of the location of the mouse centromere way back in 2009, and did include it in GRCm38. As described in the Soh et al publication, the centromere is a ~90 kb region found in the BAC clone​ AC175459.4 (bp 57238-147035). That clone is included in the tiling path for the GRCm38 assembly, and the corresponding position of the centromere in the chromosome is CM001014.2/NC_000087.7: 4072168-4161965.

If you examine the assembly AGPs (the files which describe the FASTA; see https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/; https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/635/GCA_000001635.8_GRCm38.p6/GCA_000001635.8_GRCm38.p6_assembly_structure/Primary_Assembly/assembled_chromosomes/AGP/), you'll see that the 3 Mb gaps at the start of each chromosome have distinct specifications. On all chromosomes except Y, the first 100 kb represent the telomere, the next 10 kb represent the short arm, and the next 2.89 Mb, the centromere. On chr. Y, the gap is only 101000 bp, and represents only the telomere and short arm.

The chromosome Y centromere is not listed in the AGP files because this centromere is not a gap- it is sequenced. The AGP format only has mark-up available for biological gaps (telomere, centromere, short arm, heterochromatin), and does not support a similar mark-up if those regions have been sequenced. For GRCm39, we will be using a separate file to explicitly define the chr Y centromere.

The first sequenced base (non-N) of chr Y does not include telomeric repeats, so we know that we’re still missing some sequence at this end. Thus, we still include a gap to account for this “missing” sequence, and we have used the default short-arm length for this for consistency with the other chromosomes.

Although it does not fully answer my question, it substantially clarifies the issue.

ADD COMMENT

Login before adding your answer.

Traffic: 1271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6