From FASTA file to GO-annotation
4
4
Entering edit mode
9.1 years ago
wanziyi89 ▴ 60

Dear All,

I have found a total list of about 3000 transcripts that I am interested in their gene ontology functions. I have the .fasta file of each transcripts. Can anyone advise on how should I go about to extract the GO symbols for each genes?

The pipeline I can think of is input the .fasta file onto GUI-version of Blast2Go and run a remote blast, and then map the GO from Blast2Go. I am already running this right now but it seems quite slow.

Any alternative strategies here?

regards

Ziyi

RNA-Seq GO • 10k views
ADD COMMENT
0
Entering edit mode

Have you tried DAVID?

ADD REPLY
1
Entering edit mode

Have a look at Trinotate as well.

ADD REPLY
9
Entering edit mode
9.1 years ago
SES 8.6k

Blast2GO is impractially slow in my experience and doesn't fit well into a Unix pipeline, though that is not the usual target usage. Another approach is HMMER2GO if you don't find a few commands. Here is an example starting with some transcript assemblies called "genes.fasta" and ending with GO terms for each gene/contig.

hmmer2go getorf -i genes.fasta -o genes_orfs.faa
hmmer2go run -i genes_orfs.faa -d Pfam-A.hmm
hmmer2go mapterms -i genes_orfs_Pfam-A.tblout -o genes_orfs_Pfam-A_GO.tsv --map

There is some documenation on the wiki or at the command line and a full example, as well.

ADD COMMENT
0
Entering edit mode

Interesting. I'll take a look!

ADD REPLY
0
Entering edit mode

Thanks, let me know what you think. My top priority for this project is to (one day soon) get rid of the emboss and hmmer dependencies (or at least handle them internally) so it is easier to use.

ADD REPLY
0
Entering edit mode

I'll take a look as well. That being said, it is important to be able to play with the GO terms once you get them from the analysis. Blas2Go is very efficient on that. Maybe it is time to make compatible files, and XML is a nice way to do it. Yo can go from one program to another taking the best of them

ADD REPLY
0
Entering edit mode

I think visualizing the results is important for this type of analysis. The best I've found (and fastest) for calculating enrichment and visualizing the ontology is Ontologizer. There is one more command I didn't show above called hmmer2go map2gaf that will generate a GAF file for exploring the results in Ontologizer. I like that approach because Ontologizer is fast and free and supports open formats. Last I checked, blast2go generates their own XML format and binary formats and it is not free, so that makes it much harder to use (in addition to being slow).

ADD REPLY
0
Entering edit mode

It'd be even more interesting to integrate Eric Normandeau's similarity- based solution with this one domain- based, and add later a way to merge and simplify GOs (sort of what blast2go does). In this way it'd be very easy to reproduce and update.

ADD REPLY
1
Entering edit mode

I agree completely and this is how I designed the second command (the 'run' part) to work. Instead of running hmmer, you just give a fasta database to the '-d' option and '-p blastx' will tell it to run blast instead. That design part is in place but the functionality isn't implemented yet. I'll try to find time to do that this week. That way you can import the results to blast2go or whatever tool you want.

ADD REPLY
4
Entering edit mode
9.1 years ago

Blast2Go is in my opinion one of the best choices because it will give you the chance to improve your annotation with interpro and KEGG data. In addition, it will facilitate the comparison of GO sets for different conditions through the fisher test, and will facilitate a lot everything related with graphics

You have the choice to use the Pro version of Blast2Go, which is free for a first use to you to try for a week, I believe. In the Pro version, a proprietary cloud database with the same data found in the NCBI is used that allow you to speed your analysis

ADD COMMENT
0
Entering edit mode

Blast2GO is hard to beat if you don't mind paying for it. If you don't want to pay and only want Fisher test resutls, you can use the approach suggested by another user. See my answer for an implementation of this idea (using other software to do the same job as Blast2GO).

ADD REPLY
4
Entering edit mode
9.1 years ago

I created a GitHub repo which implements the idea suggested by Israel Barrantes: https://github.com/enormandeau/go_enrichment

The documentation isn't complete and there is still some work to be done on the scripts (mostly splitting the parts of the master runall.sh script and documenting). You need to be familiar with the UNIX command line to install the prerequisites and use this approach.

If you are not in too much a hurry, I can probably finish tidying this next week.

ADD COMMENT
2
Entering edit mode
9.1 years ago

Local BLASTX versus all UniProt proteins with GOs [link]

ADD COMMENT

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6