After, performing the differential analysis with limma
. After, mapping with the feature data, I have got the data frame as follows-
FDR
Probe_ID
Gene.Symbol
Gene.ID
0.009
1555272_at
RSPH10B2///RSPH10B
728194///222967
0.007
1557203_at
PABPC1L2B///PABPC1L2A
645974///340529
0.007
1557384_at
LOC100506639///ZNF131
100506639///7690
The code for making the above df
in R
is as follows-
df <- data.frame(
FDR = c (0.009, 0.007, 0.007),
Probe_ID = c("1555272_at", "1557203_at", "1557384_at"),
Gene.Symbol = c("RSPH10B2///RSPH10B","PABPC1L2B///PABPC1L2A","LOC100506639///ZNF131"),
Gene.ID = c("728194///222967","645974///340529","100506639///7690"))
I want to perform a GSEA using the column df$Gene.Symbol
. However, I can see that more than one gene-symbol is mapped with the one Probe-ID, for which I split the whole data frame by-
df_split <- as.data.frame(df %>% separate_rows(Gene.Symbol, Gene.ID, sep = "///"))
But got repetitive gene symbols. What should be the correct way to resolve this and go about just annotating the df$Gene.Symbol
with non-repetitive gene symbols. I don't want to use any online tool as I am hard coding the micro-array pipeline as a part of my project.