Not Understanding Strange Gene Symbols Or What To Do With Them
2
1
Entering edit mode
11.4 years ago
jobinv ★ 1.1k

We've conducted an RNA-Seq study where we're trying to find a gene signature for a particular condition. We found a ten-gene signature, where two of the genes that were very specific to the condition are listed with the gene symbols "AC092165.4" and "AP001610.5" in the GTF file (from Ensembl gene ids ENSG00000237412 and ENSG00000228318, respectively). I now wish to learn more about these and have tried a number of strategies, but none of them seem to be panning out.

  1. I googled for the names, which gives me the Ensembl genome browser among the first hits. Here I see that these gene symbols seem to overlap with other genes, but not exactly. AP001610.5 overlaps with MX1, whereas the other one gives a 404 error message on the genome browser.
  2. I searched for these symbols in Integrated Genome Viewer, which again seems to show me a certain amount of overlap with other genes, such as PRSS56 and CHRND. An additional point of confusion here: if I search for the original Ensembl gene id's, entirely different areas seem to pop up.
  3. I searched on PubMed Gene, where a number of results come up for AC092165.4 but not for AP001610.5. Among this list are some of the aforementioned overlap genes. However, the relationship with these genes is not described in any detail. They don't seem to be among the alternative aliases even.

At this point, I'm stuck. Can anyone explain to me what these symbols mean, since they don't appear to be "standard" gene symbols? Also, if you have any tips for how to interpret these better so that we can go ahead with validating them, that would be great.

Thanks!

• 3.3k views
ADD COMMENT
5
Entering edit mode
11.4 years ago
Emily 24k

AC092165.4 is an ENA record of one of the transcripts of PRSS56.

http://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000237412;r=2:233385173-233390422;t=ENST00000449534

AP001610.5 appears to be the current name of this gene.

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000228318;r=21:42813321-42814669;t=ENST00000411427

You can find out more about both of them by just searching using the Ensembl IDs.

ADD COMMENT
0
Entering edit mode

quick note: ENA=European Nucleotide Archive, http://www.ebi.ac.uk/ena/

ADD REPLY
0
Entering edit mode

Thanks for clarifying that.

ADD REPLY
0
Entering edit mode

Thank you for your answer. If AP001610.5 is a gene, then I still have trouble understanding why there is such an overlap with another gene, MX1, as is clear from that link that you posted as well?

ADD REPLY
1
Entering edit mode

Sometimes genes just overlap each other. Quite often a gene will appear on the opposite strand in the introns of another gene, as this AP001610.5 is to MX1.

ADD REPLY
0
Entering edit mode

I see. This means that for further validation, we _can_ use PRSS56, but not MX1 then.

ADD REPLY
1
Entering edit mode

Comment moved to answer

ADD REPLY
0
Entering edit mode

Checking the ENA EMBL-Bank entries, these names ("AC092165.4" and "AP001610.5") are not really gene symbols, but instead refer to the source clones for this region:

Finding these, and more information about them is relatively easily done using the EMBL-EBI's search: http://www.ebi.ac.uk/

In this case ENSG00000237412 has been annotated with the gene symbol "PRSS56" by Havana (see OTTHUMG00000153326) and occurs on the sequence provided by the AC092165.4 clone (sequence version 4 of accession AC092165).

"AP001610.5" is a little more complicated since the clone is actually AP001610.1 (sequence version 1 for accession AP001610). The name "AP001610.5" has been assigned by Havana (see OTTHUMT00000195160), since this is a putative non-coding gene and thus does not have a gene symbol, and so a name is generated from the source clone and the order of the gene features found.

ADD REPLY
1
Entering edit mode
11.4 years ago
cdsouthan ★ 1.9k

The problem here is you are down in the grass of Havana/Vega annotation where the concept of "gene" is stretched (too far IMCO) to just about any piece of spliced transcript mapping anywhere. Whatever the "gene" count is, in mapped transcripts of all kinds between the limits of your boundary coordinates, you actually only have two proteins, PRSS56 and MX1, that HGNC have duly stamped with aproved Gene symbols and names. The others just get various types of Havanna codes as suffixes and anything mapping to the opposite strand inside a defined locus on the sense direction, is simply classified as an antisense "gene" . Obviously you can try PCR expression validation for some of these but you'll be hard put to discrimante between all the intron variants

ADD COMMENT

Login before adding your answer.

Traffic: 2603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6