Hi, I use bowtie2-2.2.9 to align Fasta reads with some genes. I don't know I understand correctly end-to-end alignment in Bowtie2 or not. Based on my understanding if we have a read same as bellow:
>Read1
TGCGGAATTTGATACACGTACATAAGTACGTGTTGGCTTATGCTTGCGTACGCTGAAACATGCTGACCTTTTTTTAAAACGCCCTTGTC
And we use end-to-end (it seems default option) in our alignment the aligning should involves all the characters in the read. But in my result I have some local aligning. Same as bellow that just use 8 character of read in alignment.
Read1 16 Gene.1 19 1 8M * 0 0 TAAAAAAA IIIIIIII AS:i:0 X
I run the Bowtie command with these options:
bowtie2 -f -x RefGene -U merged.fasta -S output.txt -p 6 --no-hd --no-sq --no-unal
Also I am sure that the length of Read1 is longer than 8. It is 88. I am wondering if I need to add any option in running bowtie2 to force it to align end-to-end?
I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Any specific reason to use bowtie2 with fasta reads, instead of something like blat? You should also include exact command line you used for the bowtie2 alignment to provide full context.
I always thought that bowtie2 aligned end to end by default and that you would need to pass extra parameters to make it work differently.
In your example note how you don't have clipping on the CIGAR string. This implies that your original sequence was just 8bp long. But then I don't think bowtie2 actually works for sequences that short. In addition the reported sequence cannot be found in the read that you wrote above. So ... lots of inconsistencies there...
Show both the command that you are running and the actual line that gets reported.
@WouterDeCoster Thanks for suggestion and tutorial.
@genomax I am processing metagenomics files and found Bowtie2 too fast. I really didn't tried Blat. You think it is as fast as Bowtie2?
@Istvan Albert Thanks for your comment. Actually I am sure that the length of read is not 8. It is 88. I updated my question.
Your alignment shows a sequence
TAAAAAAA
that is not present in the read that you show.In addition when an alignment takes place aligners will indicate how much of the read is clipped with the
S
orH
letters. It is strange that your SAM does not do that. In addition the alignment line that you report is incomplete, note how it ends withX
and does not show anMD
tag.You should show the complete SAM record and show the complete input sequence. Right now it still looks like some sort of inconsistency regarding either the data or the alignment. Hence we cannot troubleshoot it.
16 in second position of SAM record means the alignment is reverse. Reverse of "TAAAAAAA" is "TTTTTTTA" that present in read.
Ah indeed, good point, the sequences are always reported on the forward strand. I missed that.