Pooling annotations from different databases in InterProScan
1
0
Entering edit mode
3.4 years ago
Dunois ★ 2.8k

Is it acceptable to pool the annotations from the various sources InterProScan offers, and annotate a sequence with a subset of these?

For example, if I have something like so:

id  annot src start stop
seq1  dom1  Pfam  100  120
seq1  dom1a  CDD  101  128
seq1  dom2  Pfam  60  80

Is it acceptable to take dom1a from CDD and dom2 from Pfam, and leave out dom1 from Pfam (since it's redundant with dom1a)?

annotations interproscan protein sequence domain • 840 views
ADD COMMENT
1
Entering edit mode
3.4 years ago
Mensur Dlakic ★ 28k

When you ask is it acceptable, what purpose do you have in mind?

If you are talking about positional redundancy, dom1 and dom1a overlap, but I would still make sure that Pfam and CDD annotations for those domains are the same. If so, you would also have functional redundancy, in which case it would be safe to drop one annotation.

For the sake of consistency, you may want to consider always retaining Pfam annotations and dropping others whenever they overlap. Or you may want to always drop the shorter domain, which in the case above would be from Pfam.

ADD COMMENT
0
Entering edit mode

The purpose is to have everything recognizable on the sequence annotated while not having any redundant annotations.

And yes, the annotations I have in mind are positional and functionally redundant--for example, the same PAS domain annotated by both Pfam and CDD.

For the sake of consistency, you may want to consider always retaining Pfam annotations and dropping others whenever they overlap.

This is what I think I'll go with.

The main reason any of this is even coming up is because Pfam is missing a few crucial annotations that are covered by other databases, but using everything at one go then makes annotations of domains elsewhere on the sequence redundant. E.g., for a sequence that looks like this:

---dom1---dom2---dom3---

Pfam annotates dom1 and dom3. CDD meanwhile annotates dom2 but also dom1 and dom3, making dom1 and dom2 redundantly annotated in the process. Because of the fact that Pfam is useful elsewhere with other sequences in the analysis, I am loathe to drop Pfam, and just want to "paint in" the missing domains from other databases. Hence my question.

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6