HI
I'm working with RNA_Seq data from non-model organisms after differential gene expression analysis I find out their are lot of genes starting with prefix "LOC" and further searched in web I found out that these are genes which don't have any orthologs. I further performed downstream analysis I unable to convert LOC's into ENTREZID's/ENSEMBL ID's using clusterprofiler(bitr function). How do I proceed further for downstream analysis something line GO/KEGG analysis. Should I ignore them completely ? I had total of 7 samples after differential gene expression analysis they found to be 4501 for each sample.
If I search these ID's in NCBI I getting the gene information.
suggestions please!
Can you provide an example or two? Sometimes LOCs have informative aliases that you can use. If you have the Entrez Gene IDs you can fetch a list of all aliases for each of them.
https://www.ncbi.nlm.nih.gov/search/all/?term=LOC117740983 https://www.ncbi.nlm.nih.gov/gene/117726460 https://www.ncbi.nlm.nih.gov/search/all/?term=LOC117746502
The number after the
LOC
is the EntrezID. You can access these entries by the URLhttps://www.ncbi.nlm.nih.gov/gene/{number}
For your examples,
I thought the same, thanks for the suggestions.
Hi, this is very useful, thanks. How would I go about running GO enrichment analysis with this list?
Since
LOC
genes are uncharacterized there is likely no way to do GO enrichment analysis on those.