Hi folks,
According to my understanding, when a new genome is annotated, it should contain new information plus the old information though there might be a case, when the some entries are removed from the genome if they are updated and recognised later to be something else, than what was annotated.
The number of unique genes [refseq] in the mm9 assembly is ~22K and in the mm10, its ~15.5K, why this huge difference. I was planning to remap all the samples, with the new assembly and use it in the downstream processing, would it be helpful or I should wait for some time [if there are any planned updates to mm10].
Also, the co-ordinates for a same gene are different [eg: Adora1]
Thanks
P.S. From the NCBI release page;
Release notes: Major update made the to last MGSC release. All chromosome coordinates have changed. There is now some representation of the PAR regions on the X and Y chromosomes.
I think this co-ordinates change would impact a lot in the downstream analysis, in the cases of comparisons among mm9 & mm10.
When you say 22K now 15.5K, what is the source of that information? UCSC, EBI, NCBI?
Sean, source is the refseq table from Ucsc for these 2 builds, I sorted the file and counted for the unique genes, under the column name name2