Question

How to browse InterProScan, BLAST, HMMER results to retrieve relevant results?

1

Entering edit mode

7.8 years ago

keepclam ▴ 10

I just started working on two already annotated transcriptomes; they have been annotated with BLASTP, InterProScan, HMMER and GO terms. I consistently find problems in retrieving functional information (e.g., retrieve all the endonucleases) from annotation results, and I always end up performing a keyword search in terminal with grep and similar tools, which gives me partial results.

Is there any smarter and more biologically correct way to browse annotation results?

annotation interproscan RNA-Seq bash • 2.5k views

ADD COMMENT • link updated 7.8 years ago by cdsouthan ★ 1.9k • written 7.8 years ago by keepclam ▴ 10

0

Entering edit mode

What format are the annotations in? Genbank/GFF or just text without format?

ADD REPLY • link 7.8 years ago by GenoMax 148k

0

Entering edit mode

No, simple tab-separated text. One line of InterPro output looks like this:

Locus_22164 7c6c3b32b99a4166ca3d7b5a78aa251c 719 SMART SM00487 DEAD-like helicases superfamily 145 352 6.0E-54 T 12-08-2015 IPR014001 Helicase, superfamily 1/2, ATP-binding domain

while one line of HMMER output looks like this:

GATA PF00320.22 36 Locus_3921 - 694 1e-05 25.9 0.8 1 2 0.13 1.5e+03 -0.2 0.1 13 25 507 518 506 518 0.89 GATA zinc finger

ADD REPLY • link 7.8 years ago by keepclam ▴ 10

0

Entering edit mode

When you say you want to "browse" what are you expecting out of that? Do you need a summary of all different types of domains identified? Are you interested in knowing how many loci have no identifiable function?

Potentially you could use awk to cut columns out of these file followed by some sort of sorting to classify the results.

ADD REPLY • link 7.8 years ago by GenoMax 148k

0

Entering edit mode

I'd like to retrieve all proteins belonging to a given group of interest. Let's suppose I want to retrieve all nucleases. If I <grep> "nuclease", I automatically exclude from my results all those nucleases that don't have "nuclease" in their annotation. Is there any means to circumvent this problem? Note that InterProScan and HMMER give database IDs of their results (Pfam for HMMER and varous dbs for InterProScan).

ADD REPLY • link 7.8 years ago by keepclam ▴ 10

score 0 · Answer 1 · 2017-02-16

0

Entering edit mode

7.8 years ago

cdsouthan ★ 1.9k

Wait for the ORFs to get into UniProt, then it should be easy

ADD COMMENT • link 7.8 years ago by cdsouthan ★ 1.9k