Question

Problem for converting gene symbol to entrez id

0

Entering edit mode

3.7 years ago

bookorg ▴ 20

Hi I faced a problem for converting my .csv table file into entrez id.My .csv file consist of the colums-SYMBOL baseMean log2FoldChange lfcSE stat pvalue padj. I analyzed a GEO data set and found 99 DEG .Now I want to see the functional enrichment analysis of that DEG.For that first I have to convert my gene symbol to entrez id,so i write my code bellow

df$EntrezID <- mapIds(x = org.Hs.eg.db,
                                  keys=row.names(df),
                                  column="ENTREZID",
                                  keytype="SYMBOL",
                                  multiVals="first")

But when i run that the output shows like that

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.

Kindly help me on that regard.Thanks in advance

R • 2.2k views

ADD COMMENT • link updated 3.7 years ago by Hamid Ghaedi 3.3k • written 3.7 years ago by bookorg ▴ 20

0

Entering edit mode

When posting questions about ID/Symbol please provide examples.

ADD REPLY • link 3.7 years ago by GenoMax 153k

0

Entering edit mode

i am not clear about the example.What type of example you wanted me to give.I said all the details and error that happened

ADD REPLY • link 3.7 years ago by bookorg ▴ 20

0

Entering edit mode

Give us example of gene ID's that are generating the error. e.g. BRCA

ADD REPLY • link 3.7 years ago by GenoMax 153k

0

Entering edit mode

SYMBOL- ORF1a ORF1b N S ORF6 ORF3a ORF7a M E ORF8 RNU1-28P ZFAND5 IGHV4-59 FNIP1 IGLV3-25 RAMP3 GLUL SNORD116-2 CLEC3B ANKRD36BP2 IGHG4 IGHM IGLV1-47 GADD45A ZNF638-IT1 HSPA1L IGHG3 IGLV3-21 IGLV4-69 RRM2B RNVU1-7 IGHV4-34 NUDT16 ITM2C MIR205HG SNORA38B IGLL5 SNORD17 LAG3 IGLV1-40 IFI27L2 RAMP2 IGHV1-18 GPCPD1 H2AFY ICAM2 IGKV4-1 SNORA13 CLK4 IGHV1-46 MMRN2 HIST1H2BF APLNR KIFAP3 CTA-796E4.5 HSP90AA2P IGHA1 RTKN2 SP140 HELLPAR RPA3 APOL4 BCL2L2 IGHV5-51 LMAN2 TMEM19 IGHV3-74 RP1-309I22.2 AL355075.1 AC068580.6 PDZD8 SELK IGKV1-5 IGLC3 AKAP2 CRIP1 IGHD IGLC2 HSH2D IGJ SCARNA6 C1orf226 MZB1 AL139099.2 BPIFB1 HIST1H2BN ST6GAL1 IGLV1-44 POU2AF1 RP4-671O14.7 UBE2G2 RLIM BANK1 EHBP1L1 GZMA IGLV1-51 PABPC4 PLIN2 PSMB10 This is the 99 gene that gives error

ADD REPLY • link 3.7 years ago by bookorg ▴ 20

0

Entering edit mode

I am going to give you a solution using EntrezDirect. I assume you are referring to gi numbers when you say EntrezID. If that is not the case then let us know. BTW, gi numbers are deprecated for end-users by NCBI.

Put one gene symbol per line in a file. Example below.

$ head -5 gene
IGHA1
RTKN2
SP140
HELLPAR
RPA3

$ for i in `cat gene`; do esearch -db gene -query "${i} [GENE] AND human [ORGN]" | esummary | xtract -pattern DocumentSummary -element Id,Name; done
3493    IGHA1
283650  IGHA1
27331   IGHA1
219790  RTKN2
254060  RTKN2
11262   SP140
101101692   HELLPAR
6119    RPA3
80832   APOL4
599 BCL2L2
28388   IGHV5-51
10960   LMAN2
55266   TMEM19
28408   IGHV3-74
118987  PDZD8
58515   SELENOK
28299   IGKV1-5
28944   IGKV1-5
3539    IGLC3
28838   IGLC3
445815  PALM2AKAP2
114299  PALM2
11217   AKAP2
1396    CRIP1
25927   CNRIP1
3495    IGHD
28224   IGHD
3538    IGLC2
28839   IGLC2
84941   HSH2D
3512    JCHAIN
677772  SCARNA6

ADD REPLY • link 3.7 years ago by GenoMax 153k

0

Entering edit mode

It seems to me that there is something wrong with row.names(df). Could you show the top 10 items from that vector? something like rownames(df)[1:10].

ADD REPLY • link 3.7 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

My gene is in .csv file. and i gave the column name SYMBOL.there are also 6 column that are log2fc,p.value,adj.p.val.base mean,lfcSE,stat

ADD REPLY • link 3.7 years ago by bookorg ▴ 20

1

Entering edit mode

The function complaint about the provided keys type, you have indicated that the row names of the data frame contain that data which I guess is not true. If you pass df$SYMBOL to the keys argument, you might be good.

ADD REPLY • link 3.7 years ago by Hamid Ghaedi 3.3k