Tigr Functional Assignment Using Cdd And Rps-Blast
3
4
Entering edit mode
14.2 years ago
Carol ▴ 130

I'm trying to assign TIGR functional assignment from ncbi conserved domain protein(CDD) PSSM's using RPS BLAST programme.Could anybody advice me the suitable RPS BLAST options such as e- value cut off or any more to be applied for the getting most accurate annotations to the query proteins

pssm • 4.7k views
ADD COMMENT
3
Entering edit mode
14.2 years ago

I assume you mean TIGRFAMs. I have not checked RPS-BLAST, but another software FastHMM (uses HMMER) has some general recommendations for E-value cutoffs. There are some relevant information in TIGRFAMs release notes.

  1. Description of fields used in TIGRFAMs_9.0_INFO.tar.gz
    ID Identification: One word less than 16 characters
    AC Accession number. TIGR accession numbers take the form TIGRxxxxx where x is a digit
    DE description of the HMM
    AU Author: Person(s) responsible for alignment in the format, eg. Fish N, Chips RU
    TC Trusted cutoffs: global value, then domain value.
    NC Noise cutoffs: global value, then domain value.

TC Trusted Cutoffs might have what you are looking for.

ADD COMMENT
0
Entering edit mode

Unfortunately for denim, they are using the CDD translations of the TIGRFAM HMMs. Searching against TIGRFAMs directly via hmmpfam (version 2, still) would be a good way to use these (manually curated, BTW) cutoffs, but the scale changes completely when you move to PSIblast profiles. As Kader mentioned, any specific cutoff is probably going to be either too permissive or too restrictive, depending on the strength of the family. It is possible that the CDD folks have a cutoff table for their families, so it's worth searching for that.

ADD REPLY
3
Entering edit mode
14.1 years ago

It is difficult to define a pre-fixed e-value for a RPS-BLAST search. You may either use a default e-value (0.01) recommended in the web interface of CDD-search or use more stringent or relaxed e-values depending up on the your protein domains of interest or its homology with the query sequences. In the web-interface of CDD you may have noticed that the e-value is given in a wide range (100 to 0.000001). To standardize e-value for your search, you may take a random subset of your sequence and search using different e-values and see the distribution of e-value assigned for your domains of interest and take the most appropriate ones. If you are trying to find a best-hit using RPS-BLAST based on e-value you may also consider HSP, coverage while selecting your hits.

This URL provides a set of commands for RPS-BLAST and their options including an e-value of 0.001 . You may also look in to manuscripts that used RPS-BLAST for defining protein domains. For example, In this case the authors have used an e-value of 10^-5 as the preferred e-value.

ADD COMMENT
0
Entering edit mode

You're right Khader the e-value cant be fixed for assigning functions to all protein families given in CDD including TIGRFAM.I did optimized the evalue cut off for my proteins using subset of already annotated proteins and checked for accuracy of the annotation when the query protein produced "NO HIT" to the hypthetical proteins and also funtion matched to the annotated function of the protein subsets.Therefore one should not use single e-value cut off for all the PSSM's in the CDD. Many thanks for your advice on both of my questions.

ADD REPLY
0
Entering edit mode

Thanks for your feedback Denim. Glad to know that the answers were helpful.

ADD REPLY
1
Entering edit mode
14.1 years ago

Although the question is already marked as solved, this might help too:

NCBI does offer threshold bit scores for a hit to be counted as specific to one protein family in their domain databases (cdd, pfam, smart, tigr, prk, cog, kog). This file likely contains exactly the data your were searching for.

ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6