Hi everyone,
I am currently using blast to align metagenomic reads to a reference database. I am interpreting my blast results in the format: -outfmt "6 qseqid sseqid qlen length sstart send evalue bitscore"
I expected that |sstart - send| would be the same as the alignment length, but it differs. For example:
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 620872 621507 0.0 730
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 802698 802063 0.0 730
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 869271 869906 0.0 730
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 1278239 1277604 0.0 730
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 1407425 1408060 0.0 730
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 1467885 1467250 0.0 730
ERR2906227.2 Staphylococcus_aureus_chromosome 1076 997 2170114 2169166 1.33e-128 460
ERR2906227.3 Lactobacillus_fermentum_complete_genome 1149 1165 976005 974864 0.0 1238
ERR2906227.5 BS.pilon.polished.v3.ST170922 1060 1051 2593827 2594850 0.0 1315
ERR2906227.6 Listeria_monocytogenes_complete_genome 1126 1112 1616191 1617286 0.0 1637
In the very first alignment:
ERR2906227.1 Lactobacillus_fermentum_complete_genome 652 645 620872 621507 0.0 730
The aligned portion of the subject sequence has a length of |(620872-621507)| = 635 but blast is saying the alignment length is 645. Can anyone explain why that is?
It is possible that simple stretches of A/T's are masked. Have you looked at the pair-wise alignment of query/subject?
After looking at the blast algorithm again, I believe this difference in length might be because of gaps in the subject sequence. I originally believed the algorithm gapped the query sequence but it seems like it gaps both. And the alignment length seems to include those gaps. However, I am not 100% sure of this.
Did you look at the pair-wise alignment?
Yes, I used the online BLAST available on the blast website rather than the linux interface and it turns out that there are gaps in the subject sequence. Thanks for the help!