Entering edit mode
5.0 years ago
Sib
▴
60
I have GB-ACC numbers of differentially expressed genes from GEO2R. But I need gene symbols for entering to enrichr database for further analyses. I used BioMart to convert RefSeq mRNA ID(s) to HGNC symbols. But I am not sure that if RefSeq mRNA ID is GB-ACC? And is HGNC symbol, Gene symbol? (BioMart does not have Gen symbol and GB-ACC options)
Thank you. And what about GB-ACC? is it the same as RefSeq mRNA ID?
That is a great question. GenBank accession (GB-ACC) is not the same as RefSeq.
The RefSeq mRNA ID might start with something like NM_ (such as NM_004985).
The GenBank accession numbers follow a different format (as described here: https://www.ncbi.nlm.nih.gov/Sequin/acc.html ). For example, AF493917 would be a GenBank accession ID (note that the GB-ACC doesn't contain an underscore).
A lot of publications confuse the two but GenBank and RefSeq are two separate databases, where GenBank contains sequences submitted by individual labs whereas RefSeq data is curated and maintained by the NCBI.
I prefer RefSeq because GenBank is an archive of a bunch of raw sequences that are dumped into the database so there's a hodgepodge of redundant data and you have to do a fair amount of filtering to get what you want (in fact, RefSeq is largely based off of NCBI manually curating GenBank data). See the RefSeq paper for more information: https://www.ncbi.nlm.nih.gov/pubmed/15608248
Thanks a lot for your answer. As you said I think the GEO2R has confused the two. . As you see in the GB-ACC column, different formats like NM_201591, BX100997, BC043554, NR_038236 and etc. are used. I'll be grateful if you show me a way to obtain gene symbols of these genes.
Unfortunately, I can't think of an easy way to do it. Personally, I'd use BioMart to convert all the RefSeq IDs first, and then for the remaining IDs that can't be converted (i.e. the GenBank accession numbers), use the following file from NCBI which maps GenBank accession numbers to gene symbols: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2accession.gz
Thank you.