Question

Bowtie Alignment And Low Percentage Of Mapped Reads

4

Entering edit mode

12.4 years ago

samsara ▴ 630

I have mapped 101 bp paired end data from illumina machine against cow genome. I used Bowtie for mapping, but i am so much surprised to see incredibly low mapped reads which was 2% (Edit 1: 3.76%). I am not an exprienced user of bowtie; i did not use extra alignment parameters.

Do i need use extra parameters inorder to get higher percentage of mapped reads? Could there be any other problems?

I used following command

bowtie --chunkmbs 400 -S -p 12 bowtieGenomeIndex_cow -1 R1.fastq -2 R2.fastq

Edit 1: Bowtie output

# reads processed: 784559228
>>>>> # reads with at least one reported alignment: 29469538 (3.76%)
>>>>> # reads that failed to align: 755089690 (96.24%)
>>>>> Reported 29469538 paired-end alignments to 1 output stream(s)

Edit 2: FASTQC quality graphs

Forward Read Quality Image

Reverse Read Quality Image

Edit 3: Alignment with maximum mismatch=3 and insert size=400

# reads processed: 157554639
# reads with at least one reported alignment: 104534049 (66.35%)
# reads that failed to align: 53020590 (33.65%)
Reported 104534049 paired-end alignments to 1 output stream(s)

So, it seems the issue is with the insert size.

Edit 4: Alignment with insert size=650

# reads processed: 157554639
# reads with at least one reported alignment: 132701326 (84.23%)
# reads that failed to align: 24853313 (15.77%)
Reported 132701326 paired-end alignments to 1 output stream(s)

P.S. - data was from cow's genomic DNA.

bowtie alignment genome • 22k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 12.4 years ago by samsara ▴ 630

0

Entering edit mode

could you produce a FASTQC report of your fastq files that you used to map to bowtie here?

ADD REPLY • link 12.4 years ago by Arun 2.4k

0

Entering edit mode

Looking at fastqc is definitely worthwhile. You also want to be sure the '-X' parameter is set correctly for your paired end sizes. The default in bowtie is stringent. See this question for more discussion: http://www.biostars.org/post/show/9090/bowtie-pair-end-broken/

ADD REPLY • link 12.4 years ago by Brad Chapman 9.7k

2

Entering edit mode

good point - another test the original author may try is to map one of the files in single end mode and evaluate that

ADD REPLY • link 12.4 years ago by Istvan Albert 102k

0

Entering edit mode

What could be the correct -X and -I value for 101bp read length.

ADD REPLY • link 12.4 years ago by samsara ▴ 630

1

Entering edit mode

The -X setting is determined by the fragment size of your library. If you don't know the expected distribution, you can set it to something larger like '-X 1000' and then look at the distribution of read pairs that are mapped to estimate it. Or use BWA instead of bowtie and it will infer the size for you.

ADD REPLY • link 12.4 years ago by Brad Chapman 9.7k

0

Entering edit mode

is there dark side of setting larger insert size ? How can i calculate insert size if read length is 101bp and fragment size is 400-600bp ?

ADD REPLY • link 12.4 years ago by samsara ▴ 630

1

Entering edit mode

The main downside is speed: it'll be slower with larger insert sizes since there is a larger search space. For your sizes you'd want to set -X 600 or add some padding to that with -X 700.

ADD REPLY • link 12.4 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thanks a lot. I used -X 650 and i already got better result. I got about 85% of the reads mapped.

ADD REPLY • link 12.4 years ago by samsara ▴ 630

score 5 · Answer 1 · 2012-07-09

5

Entering edit mode

12.4 years ago

Istvan Albert 102k

Most likely your sequencing run has failed in its entirety - either library preparation or the during the sequencing process.

Look at the duplication rates (fastx) and fastqc reports.

Or perhaps the samples have been mixed up and you are aligning against the wrong genome, though even in that case one usually ends up with more than 2% mapped reads.

ADD COMMENT • link 12.4 years ago by Istvan Albert 102k

0

Entering edit mode

dint see the fastqc mention in your post.

ADD REPLY • link 12.4 years ago by Arun 2.4k

0

Entering edit mode

The genome i used is not wrong. I made BLAST of one of the sequences from fastq files, and got 99% identity across Bos taurus genome. Moreover, illumina machine reported mean quality score as 36.86

ADD REPLY • link 12.4 years ago by samsara ▴ 630

score 4 · Answer 2 · 2012-07-09

1) Eyeball some of your fastq. Do you see reads with good quality scores, or not?

2) If you see reads with good quality scores, BLAST some of them, see if BLAST can tell you what they are.

3) Find out what the adaptor sequences for your prep are, maybe you have all adaptor.

Edit:

It's annoying, but try either realigning with bowtie using a number of different possible insert sizes, or try bwa, which doesn't require you to state up front what you are expecting your insert size to be. If you reads are fine quality, and they are the right species, maybe Bowtie is throwing them out because you misinformed it as to what the true insert size is.

score 2 · Answer 3 · 2012-07-11

We had similar issue with the human tumor-normal samples : Apart from the following steps as suggested above; we also checked using down sampling fastqs.

Down sample a "million reads" and align using few Aligners (default parameter) (Bowtie; BWA; novoalign; Mosaik; gmap; BLAT) to check to see if issue is with the data or with the aligner parameters etc. ?

In our case: all the aligners had bad mapping % ; so when explored further, we could trace it to issue at the library preparation level.

Also check with the vendor if other users has issues with that "batch" of KITS used etc...

Ram · Answer 4 · 2015-01-15

0

Entering edit mode

9.9 years ago

Leandro de Mattos ▴ 90

Hi, Please,

I have mapped 100 bp paired end data from illumina machine. I used Tophat for mapping, but I have obtained low mapped reads which was 5%. Is the any parameter in tophat to get higher percentage of mapped reads? Could there be any other problem too?

Mattos.

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Leandro de Mattos ▴ 90