Hi, I'm using bowtie2 to align very short sRNA sequences (18-45nt) to reference files. I've been using the very-sensitive-local preset to align to the human genome, and bowtie2 was allowing for one, maybe 2 mismatches. However when testing against shorter references (my tRNAs) the sequences with mismatches and gaps don't align properly. I need granular control to align first without allowing any mismatches, and then allowing only one.
This is the test fasta file I'm using, with fragments of a tRNA and some induced errors (mismatches, gaps, etc).
https://pastebin.com/VhcES6ay
Against a index made from the human tRNA reference.
http://gtrnadb.ucsc.edu/genomes/eukaryota/Hsapi38/hg38-mature-tRNAs.fa
The bowtie2 command
bowtie2 --local --very-sensitive-local -x human_trna -f sequences.fa -S out.sam
The parameter -N changes the amount of mismatches allowed during the preseed alignment, but doesen't affect the end result in this case. I believe I can achieve what I want by changing the scoring function, but I'd like some feedback on whether this is a good approach or if there's another tool better fitted for me. I need a reliable system to align with a maximun amount of mismatches to work for 100nt references (tRNA), 2000nt (rRNA) and the human genome, so I worry about reliablity. I know I can check how many mismatches there are on a sam file, but in this case my sequences are not aligning properly.
I appreciate any feedback, thanks.
EDIT: I need to replicate the -v bowtie 1 option (allow X mismatches) changing the scoring options in bowtie2, but I'm unsure on how to proceed. I read the --score-min docs and tried to change it to L,-1,1 but it throws an error when running in local mode because "match score is set to 2 (default in local mode) and my score function can be negative". Which it should't unless my read lengh is 0, since my score equation is Threshold = -1 + len(read). I can force my match bonus to 0 but then I don get alignments.
Unless you need gapped alignments (which you probably don't need with reads < 45 bp) using
bowtie v.1.x
may be a better choice.You are right, bowtie 1 has -v which let's me set a maximum number of mismatches, but I lose the reads that used to align thanks to soft clipping. I may have to run both if no other answer comes up. According to bowtie2 docs, I should be able to replace the -v command by changing my scoring options, but that's why I posted my question initially.