Question

Need Recommendations For Aligning Longer (2X250Bp) Reads

4

Entering edit mode

12.9 years ago

Richard ▴ 600

Hi all, I have some miSeq 2x250 genome reads that I would like to align.

I'm thinking bwasw is good for this sort of analysis just based on comments that it can be more sensitive for reads longer than 150bp.

Am I on the right track? Are there any specific bwasw parameters I should be looking at?

aligner • 6.9k views

ADD COMMENT • link updated 12.5 years ago by kanwarjag ★ 1.2k • written 12.9 years ago by Richard ▴ 600

score 5 · Answer 1 · 2012-10-19

Hi again, Since I didn't find any useful results out there on the interweb I did a little investigation of my own.

I got my hands on 5 million 250bp Miseq reads pairs generated from an HL60 cell line. That amounts to 10 million 250bp reads.

I ran two aligners: bwasw from bwa 0.6.2 bowtie2 release beta7

Both were used with default parameters (except for telling bwasw to report only 1 alignment per read) to align the reads to hg19.

aligner    bwasw     bowtie2
version    0.6.2     2 beta 7
Bam_path_reference    hg19    hg19
#Reads    10M    10M
%ChastityFailed    6.63    6.63
%duplicates    0.021    0.229
%aligned    97.393    92.811
%paired    92.79    96.817
%uniqueAligned    96.24    98.268
mean_insert_size    1005    349
coverage_estimate    0.797583    0.758483

Some things of note:

BWASW did have a lower fraction of the reads that aligned aligning in proper pairs. Also, of the "proper pairs" that were identified some were very far apart, which is what caused the mean insert size to spike like it did. The expected insert size is on the order of 350, so bowtie2 is much closer to the target.
BWASW had the unusual trait of changing the base qualities in some reads. I observed this in many reads where the qualities of mismatched bases in the alignment had higher base qualities than were reported in the fastq file. My understanding is that an aligner should not alter the qualities of the bases in the SAM report.
I have to look deeper to figure out how bowtie2 manage to find such an increase in dups. I suspect it has something to do with how the ends of the alignments are trimmed in the two aligners, but I need to confirm that. -BWASW aligned more reads overall

Although it doesn't make a whole lot of sense given the low coverage data, I tried running some snp calling on the alignments for comparison. I used mpileup from samtools 0.1.16 on each bam, followed by vcfutils, and throwing out variants with qualities less that 20.

aligner     number snps called     concordant re. dbsnp132
bowtie2     355438    0.8172986569
bwasw     389817    0.794103387

So, it looks like the bowtie2 alignments can identify snp calls that are more specific with respect to dbsnp132. However, the number of true positive snps identified was higher in bwa (not reported directly above).

Moving forward I'll need to get my hands on some deeper data that would help with the snp calling test.

Any other thoughts are appreciated.

score 3 · Answer 2 · 2012-10-04

3

Entering edit mode

12.9 years ago

Sean Davis 27k

Bowtie2 is specifically designed for longer reads. You might give that a try, also.

ADD COMMENT • link 12.9 years ago by Sean Davis 27k

score 2 · Answer 3 · 2013-02-22

2

Entering edit mode

12.5 years ago

Nicolas Rosewick 11k

Try STAR (http://code.google.com/p/rna-star/)

very accurate and very fast !!!

ADD COMMENT • link 12.5 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Yup. I'm trying star now.

ADD REPLY • link 12.5 years ago by Richard ▴ 600

score 1 · Answer 4 · 2013-02-22

1

Entering edit mode

12.5 years ago

Leszek 4.2k

Give a try to GEM mapper. It's extremely fast (especially for longer reads) and the best in terms of sensitivity.
I found it working the best on my 2x250bp data.

ADD COMMENT • link 12.5 years ago by Leszek 4.2k

0

Entering edit mode

Yup, we're also testing GEM2 now

ADD REPLY • link 12.5 years ago by Richard ▴ 600

0

Entering edit mode

They have gem2 now? Great! Is there a link?

ADD REPLY • link 12.5 years ago by lh3 33k

0

Entering edit mode

Dang, I thought it was Gem2. Its not. Its just GEM. I'm referring to the release around Nov/Dec 2012.

ADD REPLY • link 12.5 years ago by Richard ▴ 600

0

Entering edit mode

they have updated release (the one from Nature Methods publication). try it!

ADD REPLY • link 12.5 years ago by Leszek 4.2k

0

Entering edit mode

I have already tried. See my post on outputting multiple hits. It is version 1.3beta. Gem implements the best algorithm so far, better than bwa-backtrack for a bit longer reads. One of the concerns, as you said, is gem-2-sam.

ADD REPLY • link 12.5 years ago by lh3 33k

0

Entering edit mode

I think it's the fastest and the most accurate aligner. They improved their gem-2-sam converter, but it's still not perfect...

ADD REPLY • link 12.5 years ago by Leszek 4.2k

score 0 · Answer 5 · 2013-02-21

0

Entering edit mode

12.5 years ago

dli ▴ 250

Hi there, did bwasw work for paired end reads? From its manual it seems not?

For both algorithms, the database file in the FASTA format must be first indexed with the ‘index’ command, which typically takes a few hours. The first algorithm is implemented via the ‘aln’ command, which finds the suffix array (SA) coordinates of good hits of each individual read, and the ‘samse/sampe’ command, which converts SA coordinates to chromosomal coordinate and pairs reads (for ‘sampe’). The second algorithm is invoked by the ‘bwasw’ command. It works for single-end reads only.

I think bowtie2 may work well with long paired-end reads? Have anyone also know does novoalign works?

Thanks.

ADD COMMENT • link 12.5 years ago by dli ▴ 250

0

Entering edit mode

Hi. Yes bwasw works with paired end reads. I think it was added around version 0.6.1. Bowtie2 works well for longer reads (see above). Novoalign may work as well, but it is much slower than bowtie2.

ADD REPLY • link 12.5 years ago by Richard ▴ 600

0

Entering edit mode

Thanks Richard. How does the command line looks like, for bwa-sw with paired-end input?

ADD REPLY • link 12.5 years ago by dli ▴ 250

0

Entering edit mode

A command for bwasw paired alignment:

bwa-0.6.2/bwa bwasw ../ref/phix174.fa -M -f bwasw.sam -t 2 ../fastqs/Phix_S1_L001_R1_001.fastq ../fastqs/Phix_S1_L001_R2_001.fastq

ADD REPLY • link 12.5 years ago by Richard ▴ 600

0

Entering edit mode

thank you. Didn't realize bwasw can do paired-end alignment....

ADD REPLY • link 12.4 years ago by dli ▴ 250

0

Entering edit mode

In my experience, bowtie2 is slow with longer reads

ADD REPLY • link 12.5 years ago by Leszek 4.2k

0

Entering edit mode

Not at my hand. Bowtie2 has some important features gem lacks.

ADD REPLY • link 12.5 years ago by lh3 33k

score 0 · Answer 6 · 2013-02-27

0

Entering edit mode

12.5 years ago

kanwarjag ★ 1.2k

SHRIMP2 for miseq data has worked well you may have to tweak in parameters. depending on insert

ADD COMMENT • link 12.5 years ago by kanwarjag ★ 1.2k