Question

Bowtie2 treats reads as not mapping even if the reads have exact same sequences with reference?

2

Entering edit mode

11.1 years ago

shl198 ▴ 440

Hi all,

I aligned my RNA-seq against reference genome using tophat, I used the default aligner bowtie2.

And also the default parameters:

tophat -p 8 -G $annotation -o out $database L1_1.fq.gz L1_2.fq.gz

After got the results, I found out that in the unmapped.bam file, some reads have exact same sequences with the reference. The follow is one line in the unmapped.sam file:

DGZN8DQ1:360:H9RN8ADXX:1:1101:4791:1895 69      *       0       255     *
       *       0       0       TTTTGCTTTCTGACTCTGTGCTTGTGCCTTCAAGACTTTCACAACGATTTTCTGCTCCTCAATAAGGAAAGCCCGAGATCGGAAGAGCACACGTCTGAAC    CCCFFFFFHHHHHJJJJJJJIJJJHIJJJJJIJJJIJJJJIJJJJJIJJJJJJJJJJJJIJIJJJJJIJJJJJJHHFFDEDDDDDDDDDDDDDDDDDCCD

Does anyone know why the bowtie2 doesn't treat those reads as mapped? Thanks

bowtie2 quality tophat RNA-Seq • 3.8k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by shl198 ▴ 440

Ram · Accepted Answer · 2014-07-18

2

Entering edit mode

11.1 years ago

Devon Ryan 105k

Dirty little secret: bowtie2 doesn't always find exact matches. If you change the order of reads in a file you'll sometimes get different alignment results for them. I've never bothered to find the reason, since this ends up affecting very few reads.

ADD COMMENT • link 11.1 years ago by Devon Ryan 105k

0

Entering edit mode

Hi Devon, thank you very much. I just tried mapping using bowtie2 directly instead of tophat, the result increased a little, and I also blast the unmapped reads, most of them mapped to mouse ribosomal RNA.

I didn't change the annotation file, and I made sure there are rRNA reference in the gff file. In this case, the reads should map to the reference, but they didn't.

So my guess it that tophat can filter rRNA reads automatically? Do you have any experience about this? Thank you very much.

ADD REPLY • link 11.1 years ago by shl198 ▴ 440

1

Entering edit mode

Perhaps, but it's more likely that the reads map so many times that they're discarded. There are enough copies of rRNA in the genome that this could be the case. I should add that I don't use tophat anymore, it's just too painfully slow. Give STAR a try if you have enough RAM.

ADD REPLY • link 11.1 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you very much. I will try STAR.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by shl198 ▴ 440