I am going through the gencode mm10 annotation (Comprehensive gene annotation - PRI) and I'm seeing some genes that have the same name but different gene ids. The following to be specific:
[1] Sept2 Ccl27a Ccl21b Fam205a2 Il11ra2 Ccl19 Ccl21a Jakmip1
[9] Ugt2a1 Gm3286 Btbd8 U2af1l4 Dlg2 Itgam Map2k7 Raver1
[17] Olfr912 Rnf26 Lilrb4a Sumo3 Gm2696 Adat3 Dohh Gm3055
[25] Gm12057 Spata22 St6galnac2 Srp54a Gm16381 Zfp935 Olfr190 Crybg3
[33] Pcdha11 Nudt8
Some of them are from different annotation sources (HAVANA, ENSEMBL) but others are from the same source. Many of them have loci close to each other but others (like Ccl27a) have loci that are not related.
What gives? Also how would I handle read counts associated with them? Should I just sum genes with the same name even though they have different gene ids?
Thanks!
You were lucky, I found the source:
"We recommend to use unique gene identifiers, such as NCBI Entrez gene identifiers, to cluster features into meta-features. Gene names are not recommended to use for this purpose because different genes may have the same names. Unique gene identifiers were often included in many publicly available GTF annotations which can be readily used for summarization."
from Rsubread manual, the bioconductor's package.