Hello,
My alignment rates with bowtie2 are ~95%, using paired-end reads on a de novo Trinity assembly. What I typically see in the literature is bowtie2 alignment rates of anywhere from 70-85% which is considered good.
What is the reason for the rates being so high? I know that it is because most of my reads are represented in the final set of contigs. I'm just trying to justify why my rates high compared to examples from the literature.
Any info would be great! Thanks.
I assume the Trinity assembly was not assembled from the reads you are mapping?
The reads that are mapping back at high rates are the reads that I used to generate my Trinity assembly.
Then no surprise they map well on a merged version of themselves.
I'm not sure that that is the full answer, though. If you do the assembly incorrectly, you may only get rates of around 60%-70% re-alignment. It is actually difficult to do de novo assembly and get high re-alignment rates with Illumina reads, so, kudos to michbrown!
I typically get around 90%
I do agree Macspider, but I've seen many examples where assembly quality checks by way of read representation do not exceed the range of 70-85 %.
It is possible to achieve high alignments by using very high QC thresholds during read trimming and base-quality checks, i.e., prior to alignment. For example, if you specify that all reads must be >70bp in length and have base qualities >30 at the read ends, you can be pretty sure that you'll achieve upward of 99% alignment if the reference genome is good.
So, my question would be whether or not you did some rigorous QC checks on your reads prior to assembly?
Yes, that makes sense.I did use a >30 phred quality score when I used Trim Galore! Thanks for the explanation!
No problen - I also use Trim Galore! I think that 30 is a reasonable cut-off to use, but some people go as low as 20 for this parameter. Illumina reads have known quality issues at the read ends based on how the fragments are sequenced in the instrument.
Another way to 'manufacture' a high alignment is to only include matched mate-pairs prior to alignment, and to throw out any lone mates that have no match.
I tend to conduct a 'raw' alignment with the raw FASTQ and then a secondary alignment with the QC'd FASTQ, compute resources and time permitted of course.