What does interproscan add to gene annotation?
2
1
Entering edit mode
9.4 years ago
grayapply2009 ▴ 300

I am doing de novo transcriptome annotation. I have finished blasting against the nr database and imported all the results to blast2go. I'm trying to start interproscan in blast2go but it is very slow. At this rate, the interproscan is going to take me another month (blast against nr has already taken me 1 month).

So I'm wondering what else interproscan adds to my transcriptome annotation. Can I skip this step and map the GO directly?

RNA-Seq de novo transcriptome annotation • 7.6k views
ADD COMMENT
1
Entering edit mode
9.4 years ago
ka6AjxQY ▴ 80

It's not necessary to InterProScan, but I would strongly recommend it. You'd get results based on protein signatures, while as BLAST is based on sequence similarity. Running both would give you complementary information that would likely improve your annotation.

I can say from experience that running it won't take nearly as long as BLASTing the nr db. It might be easier to just download InterProScan and run it separately from Blast2GO.

ADD COMMENT
0
Entering edit mode

Thank you for your reply, Matt. I've considered running interproscan locally but unfortunately we don't have any linux workstations in our lab. All we have is a Mac workstation. I guess I'll have to spend another month on it.

ADD REPLY
1
Entering edit mode

I've never used BLAST2GO before, but InterProScan shouldn't take long because the dbs it pulls from (PfamA, PROSITE, etc.) aren't nearly as big (in total) compared to nr db. You should also be able to choose which dbs to pull from, too, e.g. you could specify to just pull from Pfam, and that would also drastically reduce your computation time.

ADD REPLY
0
Entering edit mode

It is a great idea to pick only several main database for interproscan. I noticed the default databases for interproscan in blast2go include 17 different databases. I'm not clear the difference among them and which ones I should use. Can you please give me some advice?

The databases in blast2go are: blasprodom, fprintscan, hmmpir, hmmpfam, hmmsmart, hmmtigr, profilescan, hamap, partterscan, superfamily, signalphmm, tmhmm, hmmpanther, gene3d, phobius and coils.

ADD REPLY
0
Entering edit mode

Including all 17 would certainly increase the computation time. Each can independently add evidence to support an annotation, and some are self-evident (Superfamily determines family, signalp determines signal peptide sequences, etc.). hmmpanther can be useful if you'd like to explore the www.pantherdb.org websuite. tmhmm, gene3d, phobius and coils provide information about structure and signal sequences.

I would say hmmpfam would be a great start, since it could likely be more useful in downstream analyses. I make good use of Pfam domains in my pipeline. However, it's only a start. You may want to include more evidence as you go on, depending on the kinds of questions you may be asking from your data, e.g. consolidating information from the signal peptide/secondary structure searches to infer potential localization of proteins.

ADD REPLY
0
Entering edit mode

Great. I'll go try scan against the pfam database. Many thanks, Matt!

ADD REPLY
1
Entering edit mode
6.3 years ago
harish ▴ 470

Alternatively, instead of interproscan you can use eggnog-mapper. It is very fast if you use their server. You have to feed them the protein sequences and if possible select the taxon for best possible annotations.

In my case I tend to mostly use eggnog.

However, you can also use UniProtKb for blasts. That would be faster than blasting against NR.

Since Blast2Go requires XML format, if you can run the searches locally and output them as XML, you can save on the time running InterProScan within Blast2Go.

Look at their API documentation as well.

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6