Strange Sift Warnings
1
2
Entering edit mode
13.4 years ago
Tin ▴ 20

Hello,

I am running SIFT, specifically, I was asking SIFT to determine mutation prediction from a GI number, using a command like:

csh SIFT_for_submitting_NCBI_gi_id.csh 289666793 - BEST

I notice that in the SIFTprediction result file, I often get warning messages like these:

289666793       WARNING! S211 not allowed! score: 0.04 median: 1.97 # of sequence: 80
289666793       WARNING! S218 not allowed! score: 0.04 median: 1.96 # of sequence: 83
289666793       WARNING! Y231 not allowed! score: 0.04 median: 1.97 # of sequence: 84
289666793       WARNING! I244 not allowed! score: 0.05 median: 1.96 # of sequence: 87
289666793       WARNING! V246 not allowed! score: 0.05 median: 1.96 # of sequence: 87

I read that if the media score is above 3.25, then the prediction would have low confidence, but these warnings have scores below such threshold, so I am quite puzzled by what it means. I have tried to search for answer but haven't been able to come up with anything. Maybe I can get some expert help from here!

Much thanks in advance,

-Tin

tag2 tag1 • 3.1k views
ADD COMMENT
1
Entering edit mode

Hi Tin, I just recently installed SIFT and noticed the same type of warning messages while testing with a few mutations. I emailed the SIFT group previously with an unrelated question, and received a quick response. I would suggest trying the same - see the "Contact us" link on their website to submit your question.

ADD REPLY
2
Entering edit mode
13.2 years ago
Pauline Ng ▴ 20

Dear Tin,

This is Pauline Ng here, the creator of SIFT. There are many possible inputs for SIFT. One of them is submitting the sequence, where SIFT executes a search for homologous sequences, and then computes predictions. For this program call, (SIFT_for_submitting_fasta_seq.csh), we endeavour to try to get sequences that have a diversity of around 3.25.

However, because this takes a very long time (> 10 minutes), the other option is submitting a NCBI gi id (which you are doing). In this case we go to the NCBI website and ask NCBI for the pre-computed BLAST search (BLink), and then use this as the basis for homologous sequences. In this process, there is no selection of diversity of sequences for 3.25 We assume that NCBI is correct and don't try to select specific sequences because the "BEST" option for NCBI are supposed orthologues, which should theoretically give the best possible prediction.

Looking at this particular sequence, I would say that NCBI BLAST results have diversified too far,and I would not trust the predictions. I would recommend trying SIFT_for_submitting_fasta_seq.csh

How can I tell it's diversified too far? The warnings are an indication. The other thing I noticed is that your protein is a zinc finger protein, and zinc finger protein families are large and serve many functions. So I could see how NCBI BLink might be picking up zinc fingers in other organisms that do not have the same functionality if your particular protein has evolved specifically in humans, and there is no similar functionality in other organisms, then the "BEST" hit is simply the closest homology hit but without similar functionality.

Best,
Pauline Ng
sift-dna.org

ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6