Hi all,
I'm new to the community so apologise if this is a basic question or has been dealt with in another thread. I've looked through the fgsea documentation and am still having no luck.
I'm trying to run fgsea() on some nmf-decomposed gene expression data and keep getting an empty dataframe. This is confusing me because I conducted the analysis using MSigDB's GSEA software, and this seemed to work - I got rank-ordered gene lists and enrichment plots for each column of W_dash, so I'm sure it can be done in fgsea.
I ran nmf() on some gene expression data, then normalised the resulting W matrix. Now my task is to run GSEA on the normalised W matrix (W_dash), using the columns of W_dash as my ranked gene lists. W_dash looks like this:
W1 W2 W3 W4
AC000066_at 5.054266e-23 1.022037e-03 1.161830e-03 9.154324e-04
AC000099_at 2.620042e-03 1.369911e-03 1.495072e-02 5.842306e-03
AC000115_cds1_at 4.382477e-04 1.945546e-10 3.731523e-10 6.260872e-10
AC002045_xpt1_at 5.716331e-03 1.776428e-02 2.414694e-02 2.651467e-02
AC002073_cds1_at 3.328479e-03 4.652305e-04 4.209797e-03 4.125737e-03
AC002077_at 5.054266e-23 1.448357e-03 8.369223e-04 1.646869e-03
AC002086_at 1.109709e-03 1.874855e-03 3.724489e-04 2.646966e-03
AC002115_cds1_at 4.459879e-02 8.844502e-02 1.009927e-01 2.488297e-01
AC002115_cds3_at 2.441240e-02 1.509489e-02 2.094655e-02 1.398258e-02
AC002115_cds4_at 1.430019e-10 1.237761e-03 7.450935e-10 5.510702e-10
AC002115_rna2_at 8.419673e-03 1.567666e-02 7.554787e-03 3.770747e-02
I import a gene set from MSigDB:
>pathways = fgsea::gmtPathways("./c4.cgn.v7.4.symbols.gmt")
pathways looks like this:
>pathways[1]
$MORF_ATRX
[1] "ADCY3" "SEC31A" "BTD" "LTBP4" "UTRN" "FIG4" "AQP5" "TMEM11" "HNRNPL"
[10] "TAF5L" "INSIG2" "AGPS" "LSM1" "PIK3CB" "HTR4" "UBR2" "ZNF500" "RFC5"
[19] "ST13" "SSTR5" "PCF11" "CLP1" "MSX1" "CEP350" "COLQ" "HTR7" "MC2R"
[28] "NEK9" "PIAS2" "AMFR" "PDXDC1" "ZNF133" "MRE11" "MTX1" "CACNB2" "TMCC1"
[37] "PRELID3A" "JRK" "PPP1R3D" "NTPCR" "NKRF" "MLN" "KLHDC10" "OARD1" "FDXR"
[46] "PAXIP1" "WDR62" "RSRP1" "IPCEF1" "COX6A2" "POP4" "TCF12" "RERE" "LTN1"
[55] "ZNF292" "LEPROTL1" "KAZN" "TTLL5" "KRT33A" "GTSE1" "SEC62" "PARN" "MIA2"
[64] "DNAJC16" "SLC24A1" "SYNJ2" "SIK3" "MC5R" "FANCG" "GOLGA3" "MSL3" "ZBTB22"
Now I need to run fgsea() on the first column of W_dash as an ordered list:
>stats= sort(W_dash[,1], decreasing = T)
>stats[1:5]
M63379_at hum_alu_at S40719_s_at V00594_s_at Z70759_at
1.0000000 0.9656315 0.9164475 0.6934949 0.6814499
When I run fgsea() I get an empty dataframe:
>fgsea <- fgsea(pathways=pathways, stats=stats)
>fgsea
Empty data.table (0 rows and 8 cols): pathway,pval,padj,log2err,ES,NES...
Since the GSEA software produces the expected output, this leads me to believe I'm missing something really simple, but I can't work it out. Any help/guidance would be greatly appreciated!
Many thanks,
charlielonergan
Thanks! Is there a way to link the gene names and probe identifiers together so fgsea knows the connections?
The assays used were Affymetrix Hu6800 if that's relevant.
There are many posts, please use search function/google, e.g. Converting Affymetrix Probes To Gene Ids
Thank you so much! Sorry if this was too basic but you've really saved my bacon here :)