Note, this is for genes that are _annotated_ >1 time on the same assembly. Duplicated genes are typically just annotated twice with distinct GeneIDs; the gene symbol may be the same for these but the GeneID is not.
In the case of RefSeq annotation files, if GeneA is annotated twice then the gene_id
attribute in the GTF file will be GeneA
for the first instance and GeneA_1
for the second instance. In both cases, the gene
attribute will have the value GeneA
, which is the official symbol for that gene. For example, look at the annotation of the mouse gene Erdr1x. In the GFF3 file, the following gene rows are present:
NC_000086.8 BestRefSeq pseudogene 168793522 168801793 . + . ID=gene-Erdr1x;Dbxref=GeneID:170942,MGI:MGI:2384747;Name=Erdr1x;description=erythroid differentiation regulator 1 x;end_range=168801793,.;gbkey=Gene;gene=Erdr1x;gene_biotype=transcribed_pseudogene;gene_synonym=edr,Erdr1,Gm21887,Gm55594;partial=true;pseudo=true
NC_000087.8 BestRefSeq pseudogene 90796711 90827734 . + . ID=gene-Erdr1x-2;Dbxref=GeneID:170942,MGI:MGI:2384747;Name=Erdr1x;description=erythroid differentiation regulator 1 x;gbkey=Gene;gene=Erdr1x;gene_biotype=transcribed_pseudogene;gene_synonym=edr,Erdr1,Gm21887,Gm55594;pseudo=true
That same gene has the following two rows in GTF:
NC_000086.8 BestRefSeq gene 168793522 168801793 . + . gene_id "Erdr1x"; transcript_id ""; db_xref "GeneID:170942"; db_xref "MGI:MGI:2384747"; description "erythroid differentiation regulator 1 x"; gbkey "Gene"; gene "Erdr1x"; gene_biotype "transcribed_pseudogene"; gene_synonym "edr"; gene_synonym "Erdr1"; gene_synonym "Gm21887"; gene_synonym "Gm55594"; partial "true"; pseudo "true";
NC_000087.8 BestRefSeq gene 90796711 90827734 . + . gene_id "Erdr1x_1"; transcript_id ""; db_xref "GeneID:170942"; db_xref "MGI:MGI:2384747"; description "erythroid differentiation regulator 1 x"; gbkey "Gene"; gene "Erdr1x"; gene_biotype "transcribed_pseudogene"; gene_synonym "edr"; gene_synonym "Erdr1"; gene_synonym "Gm21887"; gene_synonym "Gm55594"; pseudo "true";
They don't have different gene IDs, but different transcript IDs, depending on the tool you are using for quantifying and the downstream analyses they can be quantified separately, but most people quantify them at the gene level.