Bowtie Output Meaning
1
0
Entering edit mode
11.8 years ago
Arpssss ▴ 40

I have to find matching of short reads of a .fq file onto human genome. I am using BowTie for that.

Here is one read of my input file:

@10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1
TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG
+ 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

And BowTie outputs:

10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1    +    10    50100    TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG    2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222    1

And

10_71499258_71499890_0_1_0_0_1:0:0_2:0:0_0/1    +    18    4514    TATTAGATCGTGTGATTATATTTGACAGGTCTTAATTGACGCGCTGTTCAGCCCTTTGAGTTCGGTTGAGTTTTGGGTTGGAGAATTTTCTTCCACAAGG    2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222    1

Can anybody inform me: what does this output means ? I mean: specially + 10 50100 and 1.

bowtie bowtie2 • 7.4k views
ADD COMMENT
2
Entering edit mode
11.8 years ago
Matt LaFave ▴ 310

The + is the orientation in which the read mapped to the reference; 10 means it mapped to chromosome 10; 50100 means the leftmost base of the alignment mapped to position 50100 (note that "leftmost" is with respect to the reference, and does not necessarily mean the 5' base of the read). I think the meaning of the 1 in the final column depends on your settings, but it's likely that it means the number of other alignments that matched the same bases as the current alignment.

You may want to refer to http://bowtie-bio.sourceforge.net/manual.shtml for more information. Hope that helps!

ADD COMMENT
0
Entering edit mode

Thanks.

But, can you help me:

I can't understand the meaning of: last column of the BowTie output.

In manual it describes as:

"A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5') end of the read."

What does it mean ? And how to find that offset ?

ADD REPLY
0
Entering edit mode

Sure thing. The 8th (final) column is a comma-separated list of mismatches. In the example you gave above, only 7 of the possible 8 columns appear, presumably because both alignments were perfect hits with no mismatches. If there were mismatches, they would appear in the 8th column. The 0-based offset just means you start counting from 0, like in a BED file. So, the first base on the 5' end of the read would be base 0, the second base would be referred to as base 1, etc.

If, for example, there were a mismatch on the 3rd base (let's say it was an G in the reference, but a T on the sequence read) and on the 10th (let's say a C in the reference, and a G on the sequence read), the final column would look like this:

2:G>T,9:C>G

ADD REPLY
0
Entering edit mode

Thanks a lot. It really helps me a lot.

ADD REPLY
0
Entering edit mode

Thanks a lot. It really helps me a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6