Question

GOterm enrichment for a single transcriptome assembly; using abundance counts and GOterms in TAB delimited form

0

Entering edit mode

7.4 years ago

jvire1 ▴ 10

I am running into the issue that all of the GOterm enrichment tools I have found so far require differential expression data. Unfortunately I am working with a single transcriptome due to lack of funding. I have only been able to graph GOterms in terms of genes annotated, which doesn't seem all that informative.

I assembled the transcriptome with Trinity and used Trinotate for GOterms. What I am trying to do is use the abundance counts and GOterm list to generate a single enriched set of GOterms for the transcriptome so that when I graph GOterms it will reflect transcript expression.

GOterm format:

TRINITY_DN10001_c0_g1_i1    GO:0003674,GO:0005484,GO:0005488

Abundance counts:

Transcripts             RSEM                                      
TRINITY_DN31990_c0_g1   196
TRINITY_DN30285_c0_g1   3
TRINITY_DN18352_c0_g1   3
TRINITY_DN32239_c0_g1   253
TRINITY_DN37759_c0_g1   3
TRINITY_DN9612_c0_g1    5
TRINITY_DN12770_c0_g1   185

RNA-Seq gene • 2.3k views

ADD COMMENT • link updated 7.4 years ago by GiantSilverSoy ▴ 130 • written 7.4 years ago by jvire1 ▴ 10

0

Entering edit mode

@Tran Kim Ngan

Thank you for your suggestion. Unfortunately my abundance counts and GOterms are in separate files, while input into REVIGO requires GOterms and a meaningful variable (cumulative abundance counts for transcripts assigned a specific GOterm). I thought maybe a awk command could be used, however it might take perl to parse the two documents and I'm not sure how to do this as a 1 year novice to bioinformatics and linux. What I am hoping to do is take the abundance counts and GOterm assignments and generate a tab delimited file with:

GOterm Abundance (cumulative TPM of all assigned transcripts)

then I could use that list to visualize GOterms with REVIGO.

(sorry for the reply being to the original question, for some reason biostars wouldn't allow me to comment on your post)

ADD REPLY • link 7.4 years ago by jvire1 ▴ 10

0

Entering edit mode

Hi, so you are trying to make something look like this?

File1:

ID1 GO1,GO2,GO3

ID2 GO3

File2:

ID1 a

ID2 b

Output:

GO1 a

GO2 a

GO3 a+b

My idea is first converting File1 into:

ID1 GO1

ID1 GO2

ID1 GO3

ID2 GO3

You can do this using the code of Whetting from Command Or Script To Generate An Annotation File For Blast2Go with some modifications

Then, you can easily add the abundance counts to make it look like this:

ID1 GO1 a

ID1 GO2 a

ID1 GO3 a

ID2 GO3 b

Next, convert the file containing the second and third columns into this file:

GO1 a

GO2 a

GO3 a b

by using another code https://asteindorff.wordpress.com/2017/04/06/change-t-db-file-for-enrichment-analysis/

From there, I think you will be fine. I am also a novice so I am not sure I am being helpful or make you more confused. Anyway, good luck!

ADD REPLY • link 7.4 years ago by GiantSilverSoy ▴ 130

0

Entering edit mode

Thank you, I really appreciate it! I'll look into this approach and let you know how it goes.

ADD REPLY • link 7.4 years ago by jvire1 ▴ 10

score 0 · Answer 1 · 2017-07-05

0

Entering edit mode

7.4 years ago

GiantSilverSoy ▴ 130

Have you tried REVIGO? http://revigo.irb.hr/

This was taken from their site:

The GO IDs may be followed by p-values or another quantity which describes the GO term in a way meaningful to you

ADD COMMENT • link 7.4 years ago by GiantSilverSoy ▴ 130