I download mouse ref genomes, mm10(GCF_000001635.26_GRCm38.p6_genomic.fna.gz)NCBI and GRCm38.p6 (Mus_musculus.GRCm38.dna.primary_assembly.fa,gz) from Ensembl sites and looked at the sequence inside. Chr. 1 through 19 had all Ns. Only Mt have valid nucleotide sequences. I'd appreciate if anybody explains what is going on ?
A following is an index of GRCm38.p6.
1 195471971 56 60 61
10 130694993 198729952 60 61
11 122082543 331603253 60 61
12 120129022 455720564 60 61
13 120421639 577851795 60 61
14 124902244 700280520 60 61
15 104043685 827264527 60 61
16 98207768 933042331 60 61
17 94987271 1032886953 60 61
18 90702639 1129457403 60 61
19 61431566 1221671810 60 61
2 182113224 1284127292 60 61
3 160039680 1469275793 60 61
4 156508116 1631982857 60 61
5 151834684 1791099498 60 61
6 149736546 1945464817 60 61
7 145441459 2097697029 60 61
8 129401213 2245562569 60 61
9 124595110 2377120525 60 61
MT 16299 2503792275 60 61
X 171031299 2503808902 60 61
Y 91744698 2677690778 60 61
You are sure it's all Ns, and not just the first few million bases of each chromosome that are all Ns?
If you are saying the sequences of chr1 to chr19 are entirely composed of
N
s, then you are wrong. Here are the stats for this genome release:The beginning of the chromosomes is represented by lots of
N
s, one has to scroll / page down considerably before seeing non-N
bases.edit
Number of
N
s, total number of bases, and percentage ofN
s per chromosome for the GRCm38 assembly: