Hello,
When I scan a protein sequence (for example, human TFAP2A) with locally installed InterProScan,
./interproscan.sh -i tfap2a.fasta -f tsv --iprlookup
I get an output file with the matches of domains from different databases (in this case, it's PRINTS and MobiDBLite). Some of these matches correspond to InterPro domains: for example, there are three PRINTS matches corresponding to IPR013854 which is an AP-2 C-terminal domain. However, I cannot see the integrated coordinates of the IPR013854 in my sequence, only coordinates for the PRINTS fragments. On the other hand, I can perfectrly see the integrated coordiates of the IPR013854 domain in TFAP2A when I scan it using the InterProScan web service.
Could I somehow make InterProScan output integrated coordinates of IPR* domains when I run it locally on my server?
Thank you!
from the top of my head: add option
-iprlookup
?It's already there:) It gives only IPR* IDs, without coordinates.
right, my bad.
OK, I checked some of my output files and it does give coordinates even for the ipr-IDs.
Can you perhaps post a small abstract of your output indicating the issue you report here? Keep in mind that in the 4th column in the tsv output it will never report IPR or such but always the original DB match (so CDD or PRINTS, ... )
Yes, sure:
So, the columns 7 and 8 should be the coordinates of a match from a specific database, not the integrated coordinates of a corresponding IPR* domain. (Although, the coordinates of the Pfam match are the same as the InterPro integrated coordinates: InterPro scan results).
yes, indeed.
correct. "ipr-domains" are never larger than the largest representative from the member databases. They can (and often will) be as long as one of the memberDBs
Keep in mind that interpro only integrates/groups domains into a shared "ID', so it has no "domains" itself. (== you can not search with a given iprdomainID, only with the domains from the memberDBs)
Thank you @lieven.sterck!