Question

very low Bowtie2 mapping rate

0

Entering edit mode

6.2 years ago

afli ▴ 190

Dear all,

I download a single end reads fastq file from a published article in 2012, when I mapped these reads to the reference genome using bowtie2, the mapping rate is less than 1%, almost all of the reads do not mapped to the reference genome. The fastq file was used by others in other paper, and the original paper shows more than 95% alignment rate. It seems that there is some wrong with my alignment method. Could you help me find it out?

My script is very simple:

bowtie2 -x bowtie2_np7_index -U SRR094109_1.fastq -S test.sam

Looking forward to hearing from you, thank you！

Aifu.

bowtie2 • 3.8k views

ADD COMMENT • link 6.2 years ago by afli ▴ 190

1

Entering edit mode

This question has been asked multiple times. Anytime you have unexplained low alignment rates you should take a sample of the reads and blast them at NCBI to make sure you have the right sample and there is no random contamination.

ADD REPLY • link 6.2 years ago by GenoMax 148k

0

Entering edit mode

Yeah, I blast it in NCBI for random 5 reads, but did not get any similar genome sequence. Each read is 36 nt. I feel a little upset about this situation.

Moreover, for the other SRA file from another study, whose read length is 35 nt, the alignment is quite normal. I wonder if there is other mistake for my script.

ADD REPLY • link 6.2 years ago by afli ▴ 190

1

Entering edit mode

I had a look at three sets of 10-15 random reads from SRR094109 and nothing seems to be showing up at NCBI blast with megablast and discontinuous megablast.

But with plain blastn you do get results back that mostly go to plants. Even then only about ~21 out of 36 bp seem to be matching from most reads so this seems to be a particularly bad dataset, which may require explicit scanning and trimming before you align.

ADD REPLY • link 6.2 years ago by GenoMax 148k

0

Entering edit mode

Thank for your help genomax! I also align the reads to rice reference genome, and just about ~21 out of 36 bp matched. I do the fastqc analysis of the fastq file, it seems quite OK, no adapter exists. The original paper said they had used MAQ software with a 1-bp mismatch allowed, this is strict.

I've sent an email to the author for help.

ADD REPLY • link 6.2 years ago by afli ▴ 190

score 2 · Accepted Answer · 2018-11-02

2

Entering edit mode

6.2 years ago

afli ▴ 190

I made a mistake that I didn't trim the reads. After trimming the 16 bp right end of the reads, alignment rate is normal, everything is OK.

ADD COMMENT • link 6.2 years ago by afli ▴ 190