Hello everybody,
I am a student and currently using Gblocks to clear multiple sequence alignments of proteins from poorly aligned regions as part of a phylogenetic project. However, it seems to me that I do not fully understand what flank positions are.
From the original paper (Castresana, 2000) as well as a whole bunch of other sources I got this information:
"In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks."
No matter what I try searching for I always seem to end up with this or a similar sentence. Unfortunately, I do not get it. To me this sounds like every block that remains in the final Gblocks output should have on both sides a minimum number of identical amino acids. By default the minimum would be 85% of the number of sequences but with -b2 option one can set a different threshold.
However, in my output, I find blocks where the sides of a block do not contain as much identical amino acids as required.
So obviously, my understanding of flank positions must be somehow wrong :(
Can anybody please help me to understand this ?
If you need to see an example to follow me, have a look at the original paper. In Fig. 1 there is given an alignment of 17 sequences. Below it says that chosen blocks are underlined and that default values were used. I do not understand how in the block starting at position 64 the flank position (in my understanding this means position 64) there are only 12 identical aas and not 14 (which is the default according to Table 2).
Thanks in advance,
Rebecca
Thank you very much for your answer. I'm afraid I have to tell you, I'm still not totally clear on it. Or, actually, I think I am, but I'm not sure.
In Castresana, 2000 (caption of Table 2, as well as in the Section "The Gblocks method for..." on p.542) it is said, that flank positions have to be highly conserved. A position counts as highly conserved if it has identical residues >= FS. From the caption: "FS = minimum number of identical sequences for a flanking position."
That was what originally confused me and why I was even more confused when I first read your answer. But after I looked into the paper a bit more carefully, I found that in the Results section the author discusses Fig. 1. And there he says, that in this example two initial blocks were built that fulfilled all requirements (also the one for the flanking positions with at least 85 % (here 14) identical residues). Further he says, one of these blocks contains a gap of which it is cleared (adjacent nonconserved positions also) leading to two new blocks, which are kept because they are both longer than BL2 threshold for minimum number of conserved positions (here 10).
So to conclude: The final blocks in the output do not have to fulfill the condition to be flanked by highly conserved regions because they may arise from the splitting of an initial block (which was flanked by highly conserved positions) and the flank condition is only evaluated for the initial blocks. Do I have it now?