Entering edit mode
5 months ago
Daniel
▴
10
This might be a dumb question but I just downloaded the T2T assemblies using "GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz" from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/
But why are my headers on my fasta file saying:
NC_060925.1 Homo sapiens isolate CHM13 chromosome 1, alternate assembly T2T-CHM13v2.0
Is it supposed to be an "alternate assembly" or am I downloading the wrong file.
You are downloading the correct file if you are interested in the T2T genome.
My guess is that the assembly is labeled "alternate" likely because this is v.2 of the assembly being released. A history of T2T releases is available here: https://github.com/marbl/CHM13/blob/master/README.md
I'm not sure if that's true that it is because "v.2 of the assembly being released", the file that OP refers to is also CHM13v2.0. I would imagine it is alternate compared to GRCh38. you can see that GRCh38 has a sort of privileged position as a human assembly on NCBI (green checkbox in below photo, from https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=9606)
Thanks for the input. Yeah I think NCBI treats the GRCh38 as a "primary" assembly. I did some digging too and they labeled the headers for GRch38 as "primary".
This is not an answer but Heng Li provided some insights into the various human reference genomes here (its a bit dated now but may help):
https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use