I have to find matching of short reads of a .fq file onto human genome. I am using BowTie for that.
Here is one read of my input file:
@10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1
TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG
+ 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
And BowTie outputs:
10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1 + 10 50100 TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1
And
10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1 + 18 4514 TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1
Can anybody inform me: what does this output means ? I mean: specially + 10 50100 and 1.
Thanks.
But, can you help me:
I can't understand the meaning of: last column of the BowTie output.
In manual it describes as:
"A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5') end of the read."
What does it mean ? And how to find that offset ?
Sure thing. The 8th (final) column is a comma-separated list of mismatches. In the example you gave above, only 7 of the possible 8 columns appear, presumably because both alignments were perfect hits with no mismatches. If there were mismatches, they would appear in the 8th column. The 0-based offset just means you start counting from 0, like in a BED file. So, the first base on the 5' end of the read would be base 0, the second base would be referred to as base 1, etc.
If, for example, there were a mismatch on the 3rd base (let's say it was an G in the reference, but a T on the sequence read) and on the 10th (let's say a C in the reference, and a G on the sequence read), the final column would look like this:
2:G>T,9:C>G
Thanks a lot. It really helps me a lot.
Thanks a lot. It really helps me a lot.