Why Does A Psi-Blast Search Produce Fewer Hits On Subsequent Iterations?
2
1
Entering edit mode
11.9 years ago
s.charonis ▴ 100

Hello all,

I have a sequence of a GPCR (PDB code 3NY8) which I am using to create a dataset of homologs (via a homology modelling pipeline) so I can perform electrostatic calculations on them. In order to create this GPCR dataset I need an alignment which I will feed to the pipeline so that it will create one homolog per alignment member.

My problem is sequence-based: I'm doing PSI/DELTA-BLASTs on the query and I am getting some odd results. When I do the 1st iteration, I get 196 hits, but on the second I get 155. It either stays the same or keeps decreasing on each successive iteration, and I have no idea how that's happening. Does anyone have any idea how that can happen? My understanding was that PSI-BLAST searches are supposed to increase the number of hits on each successive iteration since the algorithm is using a PSSM as opposed to a sequence to detect distant members.

PARAMETERS

The parameters used to construct the alignment were as follows:

Algorithm: PSI-BLAST

Database: Non-redundant protein sequence databases (includes GenBank CDS translations, PDB, SwissProt, PIR, PRF)

Organism: Homo sapiens

Exclude: Models/uncultured sample sequences (both excluded)

Maximum target sequences: 1000

Expect threshold: 10

Word size: 3

Maximum matches in a query range: 0

Matrix: BLOSUM62

Gap Costs: Existence: 12, Extension: 1

Compositional adjustments: Composition-based statistics

Filter: Low complexity regions

PSI-BLAST Threshold: 0.005

Pseudocount: 0

Any ideas would be appreciated!

Spyros

sequence proteomics • 8.6k views
ADD COMMENT
5
Entering edit mode
11.9 years ago
terdon ▴ 430

Well, since PSI-BLAST is using a PSSM to score the hits, it is no longer depending on sequence similarity alone. In each iteration, specific sites are given more/less weight and conservation at those sites is considered more/less important in choosing the hits. This means it can find distant homologs, increasing your hits, yes. It also means, though, that in future iterations, sequences that were returned as matches the first time around will now be discarded because they lack conservation at the specific sites that the PSSM defines as important based on the previous iterations.

I have to admit that I have not actually observed this myself since I have not worked much with PSI-BLAST, but from my understanding of the algorithm it makes sense.

ADD COMMENT
2
Entering edit mode

@terdon Thank you for the input! I understand your point, but aren't sequences returned as matches in the first time around used to build the profile matrix? If their information is incorporated in the form of probabilities of occurrence, isn't that information necessary to append sequences detected in subsequent searches? In other words, once the original PSSM is created, my understanding is that it would expand until no further members can be found? Thanks again.

ADD REPLY
2
Entering edit mode

@s.charonis Spyro, think of a case where the PSSM built specifies a very high score for a cysteine at position 3. Of the 100 sequences used to build the PSSM, all but one have a Cys at that position. The first time around, the one sequence with another residue at that position will be taken as a hit because it satisfies the Pblast score/e-value thresholds. The 2nd iteration however could discard the sequence because it lacks the Cys that the PSSM has shown to be important. This lack could bring its score down to below the scoring threshold used to match the matrix to a hit, even though the sequence itself was used to build the PSSM. This is obviously a very simplistic example but it illustrates the point.

ADD REPLY
2
Entering edit mode

@terdon Thank you very much, that clears up a lot! I can now justify my PSI-BLAST search findings as biologically plausible.

ADD REPLY
1
Entering edit mode

@s.charonis na 'sai kala :)

ADD REPLY
1
Entering edit mode

@terdon Poly Wraios ;)

ADD REPLY
0
Entering edit mode

This answer helped me with my own case after 9.2 years, so thanks

ADD REPLY
2
Entering edit mode
11.9 years ago
DG 7.3k

While you are enabling more distance matches with the use of the PSSM (for every iteration except the first), each iteration is adding sequences to your total number of cumulative hits (which you are then also adding in to the PSSM). This is how Psi-BLAST should work. Once you stop running new iterations it is all of the cumulative hits added together that count as your results.

ADD COMMENT
1
Entering edit mode

@Dan Thank you, this is my understanding as well; doesn't the algorithm automatically append all newly detected members to the existing profile? In other words, shouldn't I be getting a few more hits every iteration until no more hits can be found, as opposed to going from e.g. 150, 145, 143 .. ?

ADD REPLY
1
Entering edit mode

The number of new hits shouldn't be going up, the total number should. Ultimately the PSSM converges and you will get no new hits at all.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6