I am new to BLAST2GO in general, and am using b2g4pipe, and am puzzled by the output.
Background info: I have lots of XML-formatted BLASTX output, and now all I want to do is associate my BLASTX results with GO-terms when possible. So, I ran b2g4pipe using the following command:
java -Xmx50000m -cp *:ext/*: es.blast2go.prog.B2GAnnotPipe -in $1 -prop b2gPipe.properties -annot -v
This is working well (albeit slowly), but produces confusing results. For a single query, there are between 1-10 entries in the output, all with different GO-terms. For example:
QUERY_ID_0001 GO:0009060 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0005743 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0070469 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0009055 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0020037 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0016021 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0022900 cytochrome oxidase subunit partial
QUERY_ID_0001 GO:0004129 cytochrome oxidase subunit partial
1) From looking through the BLASTX output, it seems that QUERY_ID_0001
only has a single HSP, so why are there so many different GO terms for this query? Since the terms are all different, are these simply different hierarchical GO-term categorizations?
2) If a particular query has several HSPs (say 5), will b2g4pipe generate a new set of GO-terms for each of the HSPs, or will it simply take the best HSP and analyse that?
3) I am using the public mysql GO database for querying my BLAST results, and this seems to be going slowly. Will a local installation of the GO mysql database speed up this process? I have several hundred million queries, and want a very high-throughput method of annotating my BLASTX queries. Will increasing the available memory for java to something like 100G help very much?
Thanks for any answers / suggestions!!
EDIT: no ideas? all suggestions are appreciated!
You might find asking this on the Blast2GO Google Group more helpful, since the Blast2GO developers interact with users there.
I'm running into the same problem #3 - very slow mapping of blastx results with the b2g4pipe. Did you find that a local installation increased the speed enough to make this a feasible approach? Does increasing the memory to java make a difference? Thanks!