Question briefly: Let us take an example of SRC human protein (P12931) - length 536 - on top of the blastP output we will see SRC itself - with score 1119 . Why ?
What bothers me - diagonal of the blossum62 matrix - 4,5,6,9.. so in my mind output should be greater than at least 4*536 . But it is not the case. I thought the score is calculated as here: ChatGPT optimized for bioinformatics questions , i.e. in the case of the complete coincidence we should sum up over symbols blossum62[symbol, symbol], is not it ?
Note that BioPython (or Parsail) gives score 2834 , Dimond gives 1085 (almost like 536*2 = 1072, why ? ) (Example here: https://www.kaggle.com/code/alexandervc/protein-aligners-benchmark-parasail-diamond-etc?scriptVersionId=132991490&cellId=47 )
More details: SRC human protein (P12931) - uniprot:
https://www.uniprot.org/uniprotkb/P12931/entry sequence:
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
BlastP: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome
Thanks for the answer ! Would be happy to see you in our telegram group: @sberlogabio