Missing gene symbol
2
0
Entering edit mode
3.4 years ago
fahim ▴ 20

I need a suggestion regarding micro array data analysis.I analyzed a GEO dataset from GEO2R.But when I download the significant gene file with logfc,p.value,adj.p.value,reference id,ensembl id etc,there I found missing some gene symbol into the column of gene symbol.So how can I get rid of this problem.Will I should search by ensembl or ref id into the google for the gene or I should remove the missing gene symbol from the dataset and proceed the further work with the existing dataset?

gene-symbol • 3.0k views
ADD COMMENT
1
Entering edit mode

If you have ensemble ids I would suggest using biomart to annotate them.

This thread might help.

ADD REPLY
0
Entering edit mode

Most of the gene symbol are present but some row of the gene symbol column are missing

ADD REPLY
3
Entering edit mode
3.4 years ago

This isn't really an error. Some genes just don't have symbols. RefSeq IDs come from NCBI, Ensembl IDs from Ensembl and gene symbols from HGNC (in the case of the human genome). These groups do not necessarily agree on which bits of sequence are genes or what genes they are. There are quite a few Ensembl IDs, for example, that have not been assigned gene symbols by HGNC. Or if the array is old, probes may have been designed against sequences that it has since been decided are not genes. In human (but not necessarily other organisms), the majority of transcripts that do not have gene symbols are either non-coding RNA genes or un-validated gene predictions, and if you are only interested in well supported, well understood protein coding genes, its generally safe to ignore them.

ADD COMMENT
0
Entering edit mode
3.4 years ago
biomon ▴ 60

Yes I agree with Nitin, you can use biomart on the enseml website. Alternatively you can use biomartR. Be sure to check which version of the ensembl genome is used. You can also download the respective gtf, and wrangle it in R yourself to get the ensembl id and the gene names, then use something like dplyr::left_join(your_data, wrangled_gtf, by = ensembl_id) and end up with a full list of symbols.

ADD COMMENT

Login before adding your answer.

Traffic: 2638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6