Entering edit mode
5.2 years ago
mxlsherry1992
▴
80
Dear all,
I want to do a GO enrichment analysis for some of genes, I used the KOBAS to do that, and here is part of the output file1
:
GO:0022839 5.51E-55 1.69E-51 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0005254 1.59E-48 2.43E-45 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0015108 3.50E-47 3.58E-44 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:1902476 4.69E-47 3.60E-44 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0005253 6.27E-47 3.84E-44 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0006821 9.85E-46 5.03E-43 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0098661 7.54E-45 3.30E-42 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0004222 1.23E-44 4.71E-42 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0015103 5.80E-43 1.98E-40 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0015698 2.57E-41 7.89E-39 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0008237 1.13E-38 3.14E-36 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0098656 7.71E-38 1.97E-35 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0008509 1.18E-37 2.78E-35 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0022836 1.65E-34 3.61E-32 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0071456 8.21E-34 1.68E-31 TRINITY_DN108332_c0_g2_i1 TRINITY_DN130760_c2_g1_i1 TRINITY_DN30116_c0_g1_i1
GO:0036294 3.13E-33 6.01E-31 TRINITY_DN108332_c0_g2_i1 TRINITY_DN130760_c2_g1_i1 TRINITY_DN30116_c0_g1_i1
GO:0071453 1.12E-32 2.02E-30 TRINITY_DN108332_c0_g2_i1 TRINITY_DN130760_c2_g1_i1 TRINITY_DN30116_c0_g1_i1
GO:0006820 2.09E-31 3.56E-29 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0005216 3.49E-31 5.63E-29 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0022838 8.25E-31 1.26E-28 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0005902 1.83E-30 2.67E-28 TRINITY_DN304046_c0_g1_i1 TRINITY_DN62073_c0_g1_i1 TRINITY_DN102311_c6_g5_i1
GO:0015267 6.41E-30 8.93E-28 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
GO:0022803 6.85E-30 9.13E-28 TRINITY_DN54568_c0_g1_i1 TRINITY_DN130760_c2_g2_i8 TRINITY_DN130760_c2_g1_i1
If I am not misunderstanding, I should focus on the correrrected P-value < 0.05 (third column), right? And another question is that if I want to visualize it, I find a website "WEGO", but the input file should be the like this: the first column is gene name/id, then followed its corresponding GO number, the format of this input file is different for our file1
. So it you know how could I covert file1
to like the input file?
demo2000051 GO:0006470 GO:0008138
demo2000063
demo2000213 GO:0016706 GO:0019538
demo2000262
demo2000391
demo2000401 GO:0008152 GO:0016787
demo2000411
demo2000672 GO:0005509
demo2000691 GO:0005179 GO:0005576
demo2001071 GO:0000166
demo2001091 GO:0005509
demo2001111
demo2001131
demo2001201
demo2001431
demo2001601 GO:0015031 GO:0015450 GO:0016020
demo2001612
The typical way of visualizing GSEA is a volcano-like plot, enrichment score (x axis) versus -log10(FDR, or p-adjs-value, y axis). I haven't used wego but if it has calculated these two values you can make your own plot.
Hi, Thans for reply..But if you know how could I covert file1 to the second format,, there is too much genes so it is impossible for me to do it one by one :(
It looks like each row is a gene, and further columns are the GO ids related to the gene, so you can try with python making a dictionary, the first column is the keys and you have to append further columns as a list of values. Finally, print the dictionary by key and values delimitated by spaces. I recommend you to use the webgestalt`s online page for GSEA analysis, it is fast and very easy to use.