Entering edit mode
4.4 years ago
tianshenbio
▴
180
I have a newly constructed transcriptome blasted against NCBI nr database, Now I am using three tools to perform GO annotation, interproscan, blast2go, and eggnog. I wonder if there is a proper way to pool the results? I tried to merge the results in omicsbox (blast2go), I can see some terms that are too general were removed. Is there a better way to do that?
It depends on what you mean by "pool the results". Would you like to produce a table that includes that annotation from each of these tools for each transcript? If so, you might want to try EnTAP, which is a new transcriptome annotation tool that wraps all of the tools you mentioned in your question (and many more). It reports the results in a .tsv 30 fields long and can automatically produce summary tables/graphics from the run as well. It also leverages Diamond, which can run many times faster than BLAST on large datasets like transcriptomes. Here's the docs page for EnTAP
I would also mention EnTAP. I started testing it massively and made some suggestions to the devs, which they quickly pushed to a newer version. EnTAP outputs Gene Ontology (via EggNOG) counts per transcript in figures format, but I can use the respective text format of the counts to plot beautifull publication-ready bar- and pie- plots with R. Another reason to use EnTAP is that you can easily build DIAMOND databases from any protein fasta. I am currenty using nr, refseq, swissprot, eggnog and some custom sets.