I am pretty new to sequence alignment issues.
I am using vsearch --usearch_global (v2.21.1) to align short sequences (in fasta file) to a reference. The message I'm getting is:
Matching unique query sequences: 8016 of 11169 (71.77%)
I'm not 100% sure how it determines the number of unique query sequences, because there are no strictly identical sequences there.
I didn't find a straightforward answer to this either in the paper or in the manual - perhaps because the procedure is so standard? Based on experimenting with the --wordlength and --minwordmatches parameters, I'm guessing that it encodes each sequence into words, and if two sequences have the same encoding, it will only align the first one. Is this true?
And is there a sure fire way to force vsearch to align ALL the sequences?