We've conducted an RNA-Seq study where we're trying to find a gene signature for a particular condition. We found a ten-gene signature, where two of the genes that were very specific to the condition are listed with the gene symbols "AC092165.4" and "AP001610.5" in the GTF file (from Ensembl gene ids ENSG00000237412 and ENSG00000228318, respectively). I now wish to learn more about these and have tried a number of strategies, but none of them seem to be panning out.
- I googled for the names, which gives me the Ensembl genome browser among the first hits. Here I see that these gene symbols seem to overlap with other genes, but not exactly. AP001610.5 overlaps with MX1, whereas the other one gives a 404 error message on the genome browser.
- I searched for these symbols in Integrated Genome Viewer, which again seems to show me a certain amount of overlap with other genes, such as PRSS56 and CHRND. An additional point of confusion here: if I search for the original Ensembl gene id's, entirely different areas seem to pop up.
- I searched on PubMed Gene, where a number of results come up for AC092165.4 but not for AP001610.5. Among this list are some of the aforementioned overlap genes. However, the relationship with these genes is not described in any detail. They don't seem to be among the alternative aliases even.
At this point, I'm stuck. Can anyone explain to me what these symbols mean, since they don't appear to be "standard" gene symbols? Also, if you have any tips for how to interpret these better so that we can go ahead with validating them, that would be great.
Thanks!
quick note: ENA=European Nucleotide Archive, http://www.ebi.ac.uk/ena/
Thanks for clarifying that.
Thank you for your answer. If AP001610.5 is a gene, then I still have trouble understanding why there is such an overlap with another gene, MX1, as is clear from that link that you posted as well?
Sometimes genes just overlap each other. Quite often a gene will appear on the opposite strand in the introns of another gene, as this AP001610.5 is to MX1.
I see. This means that for further validation, we _can_ use PRSS56, but not MX1 then.
Comment moved to answer
Checking the ENA EMBL-Bank entries, these names ("AC092165.4" and "AP001610.5") are not really gene symbols, but instead refer to the source clones for this region:
Finding these, and more information about them is relatively easily done using the EMBL-EBI's search: http://www.ebi.ac.uk/
In this case ENSG00000237412 has been annotated with the gene symbol "PRSS56" by Havana (see OTTHUMG00000153326) and occurs on the sequence provided by the AC092165.4 clone (sequence version 4 of accession AC092165).
"AP001610.5" is a little more complicated since the clone is actually AP001610.1 (sequence version 1 for accession AP001610). The name "AP001610.5" has been assigned by Havana (see OTTHUMT00000195160), since this is a putative non-coding gene and thus does not have a gene symbol, and so a name is generated from the source clone and the order of the gene features found.