Question

Very high alignment rate with bowtie2

0

Entering edit mode

7.2 years ago

TrentGenomics ▴ 30

Hello,

My alignment rates with bowtie2 are ~95%, using paired-end reads on a de novo Trinity assembly. What I typically see in the literature is bowtie2 alignment rates of anywhere from 70-85% which is considered good.

What is the reason for the rates being so high? I know that it is because most of my reads are represented in the final set of contigs. I'm just trying to justify why my rates high compared to examples from the literature.

Any info would be great! Thanks.

rna-seq alignment • 4.5k views

ADD COMMENT • link 7.2 years ago by TrentGenomics ▴ 30

0

Entering edit mode

I assume the Trinity assembly was not assembled from the reads you are mapping?

ADD REPLY • link 7.2 years ago by Damian Kao 16k

0

Entering edit mode

The reads that are mapping back at high rates are the reads that I used to generate my Trinity assembly.

ADD REPLY • link 7.2 years ago by TrentGenomics ▴ 30

0

Entering edit mode

Then no surprise they map well on a merged version of themselves.

ADD REPLY • link 7.2 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

I'm not sure that that is the full answer, though. If you do the assembly incorrectly, you may only get rates of around 60%-70% re-alignment. It is actually difficult to do de novo assembly and get high re-alignment rates with Illumina reads, so, kudos to michbrown!

I typically get around 90%

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

I do agree Macspider, but I've seen many examples where assembly quality checks by way of read representation do not exceed the range of 70-85 %.

ADD REPLY • link 7.2 years ago by TrentGenomics ▴ 30

0

Entering edit mode

It is possible to achieve high alignments by using very high QC thresholds during read trimming and base-quality checks, i.e., prior to alignment. For example, if you specify that all reads must be >70bp in length and have base qualities >30 at the read ends, you can be pretty sure that you'll achieve upward of 99% alignment if the reference genome is good.

So, my question would be whether or not you did some rigorous QC checks on your reads prior to assembly?

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Yes, that makes sense.I did use a >30 phred quality score when I used Trim Galore! Thanks for the explanation!

ADD REPLY • link 7.2 years ago by TrentGenomics ▴ 30

0

Entering edit mode

No problen - I also use Trim Galore! I think that 30 is a reasonable cut-off to use, but some people go as low as 20 for this parameter. Illumina reads have known quality issues at the read ends based on how the fragments are sequenced in the instrument.

Another way to 'manufacture' a high alignment is to only include matched mate-pairs prior to alignment, and to throw out any lone mates that have no match.

I tend to conduct a 'raw' alignment with the raw FASTQ and then a secondary alignment with the QC'd FASTQ, compute resources and time permitted of course.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

score 1 · Answer 1 · 2017-09-14

1

Entering edit mode

7.2 years ago

TrentGenomics ▴ 30

These are just the tips I was looking for. I'm going to play around with the raw and trimmed reads with bowtie2 and have a look at the alignment stats.

Thanks again, Kevin! I appreciate it.

ADD COMMENT • link 7.2 years ago by TrentGenomics ▴ 30