GO enrichment analysis
0
0
Entering edit mode
5.3 years ago

Dear all,

I want to do a GO enrichment analysis for some of genes, I used the KOBAS to do that, and here is part of the output file1:

GO:0022839  5.51E-55    1.69E-51    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005254  1.59E-48    2.43E-45    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015108  3.50E-47    3.58E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:1902476  4.69E-47    3.60E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005253  6.27E-47    3.84E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0006821  9.85E-46    5.03E-43    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0098661  7.54E-45    3.30E-42    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0004222  1.23E-44    4.71E-42    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015103  5.80E-43    1.98E-40    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015698  2.57E-41    7.89E-39    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0008237  1.13E-38    3.14E-36    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0098656  7.71E-38    1.97E-35    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0008509  1.18E-37    2.78E-35    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022836  1.65E-34    3.61E-32    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0071456  8.21E-34    1.68E-31    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0036294  3.13E-33    6.01E-31    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0071453  1.12E-32    2.02E-30    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0006820  2.09E-31    3.56E-29    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005216  3.49E-31    5.63E-29    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022838  8.25E-31    1.26E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005902  1.83E-30    2.67E-28    TRINITY_DN304046_c0_g1_i1   TRINITY_DN62073_c0_g1_i1    TRINITY_DN102311_c6_g5_i1
GO:0015267  6.41E-30    8.93E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022803  6.85E-30    9.13E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1

If I am not misunderstanding, I should focus on the correrrected P-value < 0.05 (third column), right? And another question is that if I want to visualize it, I find a website "WEGO", but the input file should be the like this: the first column is gene name/id, then followed its corresponding GO number, the format of this input file is different for our file1. So it you know how could I covert file1 to like the input file?

demo2000051 GO:0006470  GO:0008138  
demo2000063 
demo2000213 GO:0016706  GO:0019538  
demo2000262 
demo2000391 
demo2000401 GO:0008152  GO:0016787  
demo2000411 
demo2000672 GO:0005509  
demo2000691 GO:0005179  GO:0005576  
demo2001071 GO:0000166  
demo2001091 GO:0005509  
demo2001111 
demo2001131 
demo2001201 
demo2001431 
demo2001601 GO:0015031  GO:0015450  GO:0016020  
demo2001612
RNA-Seq • 1.3k views
ADD COMMENT
0
Entering edit mode

The typical way of visualizing GSEA is a volcano-like plot, enrichment score (x axis) versus -log10(FDR, or p-adjs-value, y axis). I haven't used wego but if it has calculated these two values you can make your own plot.

ADD REPLY
0
Entering edit mode

Hi, Thans for reply..But if you know how could I covert file1 to the second format,, there is too much genes so it is impossible for me to do it one by one :(

ADD REPLY
1
Entering edit mode

It looks like each row is a gene, and further columns are the GO ids related to the gene, so you can try with python making a dictionary, the first column is the keys and you have to append further columns as a list of values. Finally, print the dictionary by key and values delimitated by spaces. I recommend you to use the webgestalt`s online page for GSEA analysis, it is fast and very easy to use.

ADD REPLY

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6