Problem for converting gene symbol to entrez id
0
0
Entering edit mode
2.9 years ago
bookorg ▴ 20

Hi I faced a problem for converting my .csv table file into entrez id.My .csv file consist of the colums-SYMBOL baseMean log2FoldChange lfcSE stat pvalue padj. I analyzed a GEO data set and found 99 DEG .Now I want to see the functional enrichment analysis of that DEG.For that first I have to convert my gene symbol to entrez id,so i write my code bellow

df$EntrezID <- mapIds(x = org.Hs.eg.db,
                                  keys=row.names(df),
                                  column="ENTREZID",
                                  keytype="SYMBOL",
                                  multiVals="first")

But when i run that the output shows like that

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.

Kindly help me on that regard.Thanks in advance

R • 1.7k views
ADD COMMENT
0
Entering edit mode

When posting questions about ID/Symbol please provide examples.

ADD REPLY
0
Entering edit mode

i am not clear about the example.What type of example you wanted me to give.I said all the details and error that happened

ADD REPLY
0
Entering edit mode

Give us example of gene ID's that are generating the error. e.g. BRCA

ADD REPLY
0
Entering edit mode

SYMBOL- ORF1a ORF1b N S ORF6 ORF3a ORF7a M E ORF8 RNU1-28P ZFAND5 IGHV4-59 FNIP1 IGLV3-25 RAMP3 GLUL SNORD116-2 CLEC3B ANKRD36BP2 IGHG4 IGHM IGLV1-47 GADD45A ZNF638-IT1 HSPA1L IGHG3 IGLV3-21 IGLV4-69 RRM2B RNVU1-7 IGHV4-34 NUDT16 ITM2C MIR205HG SNORA38B IGLL5 SNORD17 LAG3 IGLV1-40 IFI27L2 RAMP2 IGHV1-18 GPCPD1 H2AFY ICAM2 IGKV4-1 SNORA13 CLK4 IGHV1-46 MMRN2 HIST1H2BF APLNR KIFAP3 CTA-796E4.5 HSP90AA2P IGHA1 RTKN2 SP140 HELLPAR RPA3 APOL4 BCL2L2 IGHV5-51 LMAN2 TMEM19 IGHV3-74 RP1-309I22.2 AL355075.1 AC068580.6 PDZD8 SELK IGKV1-5 IGLC3 AKAP2 CRIP1 IGHD IGLC2 HSH2D IGJ SCARNA6 C1orf226 MZB1 AL139099.2 BPIFB1 HIST1H2BN ST6GAL1 IGLV1-44 POU2AF1 RP4-671O14.7 UBE2G2 RLIM BANK1 EHBP1L1 GZMA IGLV1-51 PABPC4 PLIN2 PSMB10 This is the 99 gene that gives error

ADD REPLY
0
Entering edit mode

I am going to give you a solution using EntrezDirect. I assume you are referring to gi numbers when you say EntrezID. If that is not the case then let us know. BTW, gi numbers are deprecated for end-users by NCBI.

Put one gene symbol per line in a file. Example below.

$ head -5 gene
IGHA1
RTKN2
SP140
HELLPAR
RPA3

$ for i in `cat gene`; do esearch -db gene -query "${i} [GENE] AND human [ORGN]" | esummary | xtract -pattern DocumentSummary -element Id,Name; done
3493    IGHA1
283650  IGHA1
27331   IGHA1
219790  RTKN2
254060  RTKN2
11262   SP140
101101692   HELLPAR
6119    RPA3
80832   APOL4
599 BCL2L2
28388   IGHV5-51
10960   LMAN2
55266   TMEM19
28408   IGHV3-74
118987  PDZD8
58515   SELENOK
28299   IGKV1-5
28944   IGKV1-5
3539    IGLC3
28838   IGLC3
445815  PALM2AKAP2
114299  PALM2
11217   AKAP2
1396    CRIP1
25927   CNRIP1
3495    IGHD
28224   IGHD
3538    IGLC2
28839   IGLC2
84941   HSH2D
3512    JCHAIN
677772  SCARNA6
ADD REPLY
0
Entering edit mode

It seems to me that there is something wrong with row.names(df). Could you show the top 10 items from that vector? something like rownames(df)[1:10].

ADD REPLY
0
Entering edit mode

My gene is in .csv file. and i gave the column name SYMBOL.there are also 6 column that are log2fc,p.value,adj.p.val.base mean,lfcSE,stat

ADD REPLY
1
Entering edit mode

The function complaint about the provided keys type, you have indicated that the row names of the data frame contain that data which I guess is not true. If you pass df$SYMBOL to the keys argument, you might be good.

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6