IN the paper “Improved criteria and comparative genomics tool provide new insights into grass paleogenomics " they said:
To increase the significance of inter-specific sequence alignments for inferring evolutionary relationships between genomes, we defined two new parameters for BLAST analysis: CIP for Cumulative Identity Percentage and CALP for Cumulative Alignment Length Percentage. CIP = ∑nb ID by (HSP/AL) x 100 corresponds to the cumulative percent of sequence identity observed for all the HSPs divided by the cumulative aligned length (AL) which corresponds to the sum of all HSP lengths. CALP = AL/query length is the sum of the HSP lengths (AL) for all HSPs divided by the length of the query sequence. With these parameters, BLAST produces the highest cumulative percentage identity over the longest cumulative length thereby increasing stringency in defining conservation between two genome sequences.
In my opinion, CIP is simply the sum of num_identical (number of identical residues, 'Bioperl') for all HSPs divided by the AL (sum of all HSP lengths). Am I right? But the formula CIP = ∑nb ID by (HSP/AL) x 100 puzzle me. Can I interprete it as
CIP=∑nb ID and ID=(HSP/AL) x 100,
but what's the meaning of the 'HSP' here ? and 'nb'?
Can you help me?
Can you link to the paper so we can look at it?
Don't worry - full text link is http://bib.oxfordjournals.org/cgi/content/full/10/6/619.
AFAIK, HSP or hsp = high-scoring segment pairs (HSPs). You can obtain this from a BLAST output. Ref. http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml. Not sure what's nb is, don't have access to paper. You may also try to get exact details by enquiring with authors as well.