Is it acceptable to pool the annotations from the various sources InterProScan
offers, and annotate a sequence with a subset of these?
For example, if I have something like so:
id annot src start stop
seq1 dom1 Pfam 100 120
seq1 dom1a CDD 101 128
seq1 dom2 Pfam 60 80
Is it acceptable to take dom1a
from CDD
and dom2
from Pfam
, and leave out dom1
from Pfam
(since it's redundant with dom1a
)?
The purpose is to have everything recognizable on the sequence annotated while not having any redundant annotations.
And yes, the annotations I have in mind are positional and functionally redundant--for example, the same PAS domain annotated by both
Pfam
andCDD
.This is what I think I'll go with.
The main reason any of this is even coming up is because
Pfam
is missing a few crucial annotations that are covered by other databases, but using everything at one go then makes annotations of domains elsewhere on the sequence redundant. E.g., for a sequence that looks like this:---dom1---dom2---dom3---
Pfam
annotatesdom1
anddom3
.CDD
meanwhile annotatesdom2
but alsodom1
anddom3
, makingdom1
anddom2
redundantly annotated in the process. Because of the fact thatPfam
is useful elsewhere with other sequences in the analysis, I am loathe to dropPfam
, and just want to "paint in" the missing domains from other databases. Hence my question.