I am running into the issue that all of the GOterm enrichment tools I have found so far require differential expression data. Unfortunately I am working with a single transcriptome due to lack of funding. I have only been able to graph GOterms in terms of genes annotated, which doesn't seem all that informative.
I assembled the transcriptome with Trinity and used Trinotate for GOterms. What I am trying to do is use the abundance counts and GOterm list to generate a single enriched set of GOterms for the transcriptome so that when I graph GOterms it will reflect transcript expression.
GOterm format:
TRINITY_DN10001_c0_g1_i1 GO:0003674,GO:0005484,GO:0005488
Abundance counts:
Transcripts RSEM
TRINITY_DN31990_c0_g1 196
TRINITY_DN30285_c0_g1 3
TRINITY_DN18352_c0_g1 3
TRINITY_DN32239_c0_g1 253
TRINITY_DN37759_c0_g1 3
TRINITY_DN9612_c0_g1 5
TRINITY_DN12770_c0_g1 185
@Tran Kim Ngan
Thank you for your suggestion. Unfortunately my abundance counts and GOterms are in separate files, while input into REVIGO requires GOterms and a meaningful variable (cumulative abundance counts for transcripts assigned a specific GOterm). I thought maybe a awk command could be used, however it might take perl to parse the two documents and I'm not sure how to do this as a 1 year novice to bioinformatics and linux. What I am hoping to do is take the abundance counts and GOterm assignments and generate a tab delimited file with:
GOterm Abundance (cumulative TPM of all assigned transcripts)
then I could use that list to visualize GOterms with REVIGO.
(sorry for the reply being to the original question, for some reason biostars wouldn't allow me to comment on your post)
Hi, so you are trying to make something look like this?
File1:
ID1 GO1,GO2,GO3
ID2 GO3
File2:
ID1 a
ID2 b
Output:
GO1 a
GO2 a
GO3 a+b
My idea is first converting File1 into:
ID1 GO1
ID1 GO2
ID1 GO3
ID2 GO3
You can do this using the code of Whetting from Command Or Script To Generate An Annotation File For Blast2Go with some modifications
Then, you can easily add the abundance counts to make it look like this:
ID1 GO1 a
ID1 GO2 a
ID1 GO3 a
ID2 GO3 b
Next, convert the file containing the second and third columns into this file:
GO1 a
GO2 a
GO3 a b
by using another code https://asteindorff.wordpress.com/2017/04/06/change-t-db-file-for-enrichment-analysis/
From there, I think you will be fine. I am also a novice so I am not sure I am being helpful or make you more confused. Anyway, good luck!
Thank you, I really appreciate it! I'll look into this approach and let you know how it goes.