Question

Extremely low mapping rates with bowtie2

1

Entering edit mode

7.8 years ago

Sachin ▴ 10

Hi, We have done some sequencing of drosophila dna. I ran fastqc and the results were good except in the duplicate sequences section. I have not done data trimming before alignment. With two sets of data we are getting very different mapping rates. Very low with one:

4378379 reads; of these:
  54378379 (100.00%) were unpaired; of these:
    51703307 (95.08%) aligned 0 times
    1724019 (3.17%) aligned exactly 1 time
    951053 (1.75%) aligned >1 times
4.92% overall alignment rate

Better with the other:

64029342 reads; of these:
  64029342 (100.00%) were unpaired; of these:
    16392556 (25.60%) aligned 0 times
    40232444 (62.83%) aligned exactly 1 time
    7404342 (11.56%) aligned >1 times
74.40% overall alignment rate

What could the reason for this be?

alignment next-gen • 9.6k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 7.8 years ago by Sachin ▴ 10

4

Entering edit mode

One other trick you can try:

samtools view -f 4 mybam.bam | cut -f 10 | sort | uniq -c | sort -nr | head

Will show you the top 10 most common unmapped reads. The command will take some time to finish, but those sequences might be more useful for blasting than randomly chosen unmapped reads.

ADD REPLY • link 7.8 years ago by swbarnes2 15k

0

Entering edit mode

Did you use correct reference genome for first datasets? Can you please share the command used in the analysis

ADD REPLY • link 7.8 years ago by Renesh ★ 2.2k

0

Entering edit mode

Yes. I used the same genome for both the datasets. Here is the command I used : bowtie2 -p 12 -x /bowtie2index/dm6 -U File1.fq -S File1.sam

ADD REPLY • link 7.8 years ago by Sachin ▴ 10

3

Entering edit mode

Any time there is unexpected low % mapping, you need to take a small/random selection of reads and blast them at NCBI. If you have a problem with contamination of some sort, it will quickly become apparent.

ADD REPLY • link 7.8 years ago by GenoMax 152k

1

Entering edit mode

I did this recently with a mapping rate of 40%. The data was supposed to be mouse data, BLAST matched a subset with mouse and also human. After raising the issue with the sequencing company, they confirmed the sample was contaminated with human DNA (don't even get me started on why they didn't check for this before sending us results!)

ADD REPLY • link 7.8 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Great suggestion genomax!

ADD REPLY • link 7.8 years ago by Kevin Blighe 89k

0

Entering edit mode

The alignment for the first sample is pretty shocking (i.e. poor). It's as if the DNA was from a different genus. In fact, I have aligned human DNA to a mouse genome in the past and achieved better alignment.

Are you using the correct genome version?
Did you index the genome with the same version of Bowtie that you are using for re-alignment?
What are your read lengths?
Which library preparation protocol did you use?
Are your FASTQ files formatted correctly?
What is the average base quality in your reads (use FastQC)?

ADD REPLY • link 7.8 years ago by Kevin Blighe 89k

0

Entering edit mode

I'm having similar mapping results to dm6 too, some sample is lower than 5% and some is higher than 70%. Did you solve your mapping problem? Any suggestions? Thank you!

ADD REPLY • link 6.3 years ago by Jingyue ▴ 70

0

Entering edit mode

What data has been sequenced? genome or transcriptome?

Also, what are you mapping upon? - genome or transcriptome ?

For mapping RNA-seq data onto genome, it is recommended to use HISAT, tophat or STAR aligner

ADD REPLY • link 7.8 years ago by lakhujanivijay 5.9k

0

Entering edit mode

I am having poor alignment rates too (less than 30% with a congeneric species!) Is this what you are supposed to get, considering that my data is GBS (short reads) with a max length of 90 bp (but mostly shorter) ?

Thanks for the help

ADD REPLY • link 4.4 years ago by giulia.trauzzi ▴ 30

0

Entering edit mode

Probably not. Have you checked some of the non-mapping reads via blast to see what they are?

ADD REPLY • link 4.4 years ago by GenoMax 152k