Need Recommendations For Aligning Longer (2X250Bp) Reads
6
4
Entering edit mode
12.2 years ago
Richard ▴ 590

Hi all, I have some miSeq 2x250 genome reads that I would like to align.

I'm thinking bwasw is good for this sort of analysis just based on comments that it can be more sensitive for reads longer than 150bp.

Am I on the right track? Are there any specific bwasw parameters I should be looking at?

aligner • 6.0k views
ADD COMMENT
5
Entering edit mode
12.1 years ago
Richard ▴ 590

Hi again, Since I didn't find any useful results out there on the interweb I did a little investigation of my own.

I got my hands on 5 million 250bp Miseq reads pairs generated from an HL60 cell line. That amounts to 10 million 250bp reads.

I ran two aligners: bwasw from bwa 0.6.2 bowtie2 release beta7

Both were used with default parameters (except for telling bwasw to report only 1 alignment per read) to align the reads to hg19.

aligner    bwasw     bowtie2
version    0.6.2     2 beta 7
Bam_path_reference    hg19    hg19
#Reads    10M    10M
%ChastityFailed    6.63    6.63
%duplicates    0.021    0.229
%aligned    97.393    92.811
%paired    92.79    96.817
%uniqueAligned    96.24    98.268
mean_insert_size    1005    349
coverage_estimate    0.797583    0.758483

Some things of note:

  • BWASW did have a lower fraction of the reads that aligned aligning in proper pairs. Also, of the "proper pairs" that were identified some were very far apart, which is what caused the mean insert size to spike like it did. The expected insert size is on the order of 350, so bowtie2 is much closer to the target.
  • BWASW had the unusual trait of changing the base qualities in some reads. I observed this in many reads where the qualities of mismatched bases in the alignment had higher base qualities than were reported in the fastq file. My understanding is that an aligner should not alter the qualities of the bases in the SAM report.
  • I have to look deeper to figure out how bowtie2 manage to find such an increase in dups. I suspect it has something to do with how the ends of the alignments are trimmed in the two aligners, but I need to confirm that. -BWASW aligned more reads overall

Although it doesn't make a whole lot of sense given the low coverage data, I tried running some snp calling on the alignments for comparison. I used mpileup from samtools 0.1.16 on each bam, followed by vcfutils, and throwing out variants with qualities less that 20.

aligner     number snps called     concordant re. dbsnp132
bowtie2     355438    0.8172986569
bwasw     389817    0.794103387

So, it looks like the bowtie2 alignments can identify snp calls that are more specific with respect to dbsnp132. However, the number of true positive snps identified was higher in bwa (not reported directly above).

Moving forward I'll need to get my hands on some deeper data that would help with the snp calling test.

Any other thoughts are appreciated.

ADD COMMENT
2
Entering edit mode

Bowtie2 requires full-length match by default. If you use "--local", bowtie2 will align more reads. At my hand for 100bp reads, bowtie2 gives more SNP calls but slightly lower ts/tv - the contrary of your last table. I do not know if this is caused by not using "--local". Overall, the two mappers are comparable. I admit bwa-sw has some bugs in the SAM output. I am writing a new algorithm that is supposed to be faster and more accurate than these two.

ADD REPLY
0
Entering edit mode

Yuck. How does one post tables in this forum?

ADD REPLY
3
Entering edit mode
12.2 years ago

Bowtie2 is specifically designed for longer reads. You might give that a try, also.

ADD COMMENT
2
Entering edit mode
11.8 years ago

Try STAR (http://code.google.com/p/rna-star/)

very accurate and very fast !!!

ADD COMMENT
0
Entering edit mode

Yup. I'm trying star now.

ADD REPLY
1
Entering edit mode
11.8 years ago
Leszek 4.2k

Give a try to GEM mapper. It's extremely fast (especially for longer reads) and the best in terms of sensitivity.
I found it working the best on my 2x250bp data.

ADD COMMENT
0
Entering edit mode

Yup, we're also testing GEM2 now

ADD REPLY
0
Entering edit mode

They have gem2 now? Great! Is there a link?

ADD REPLY
0
Entering edit mode

Dang, I thought it was Gem2. Its not. Its just GEM. I'm referring to the release around Nov/Dec 2012.

ADD REPLY
0
Entering edit mode

they have updated release (the one from Nature Methods publication). try it!

ADD REPLY
0
Entering edit mode

I have already tried. See my post on outputting multiple hits. It is version 1.3beta. Gem implements the best algorithm so far, better than bwa-backtrack for a bit longer reads. One of the concerns, as you said, is gem-2-sam.

ADD REPLY
0
Entering edit mode

I think it's the fastest and the most accurate aligner. They improved their gem-2-sam converter, but it's still not perfect...

ADD REPLY
0
Entering edit mode
11.8 years ago
dli ▴ 250

Hi there, did bwasw work for paired end reads? From its manual it seems not?

For both algorithms, the database file in the FASTA format must be first indexed with the ‘index’ command, which typically takes a few hours. The first algorithm is implemented via the ‘aln’ command, which finds the suffix array (SA) coordinates of good hits of each individual read, and the ‘samse/sampe’ command, which converts SA coordinates to chromosomal coordinate and pairs reads (for ‘sampe’). The second algorithm is invoked by the ‘bwasw’ command. It works for single-end reads only.

I think bowtie2 may work well with long paired-end reads? Have anyone also know does novoalign works?

Thanks.

ADD COMMENT
0
Entering edit mode

Hi. Yes bwasw works with paired end reads. I think it was added around version 0.6.1. Bowtie2 works well for longer reads (see above). Novoalign may work as well, but it is much slower than bowtie2.

ADD REPLY
0
Entering edit mode

Thanks Richard. How does the command line looks like, for bwa-sw with paired-end input?

ADD REPLY
0
Entering edit mode

A command for bwasw paired alignment:

bwa-0.6.2/bwa bwasw ../ref/phix174.fa -M -f bwasw.sam -t 2 ../fastqs/Phix_S1_L001_R1_001.fastq ../fastqs/Phix_S1_L001_R2_001.fastq
ADD REPLY
0
Entering edit mode

thank you. Didn't realize bwasw can do paired-end alignment....

ADD REPLY
0
Entering edit mode

In my experience, bowtie2 is slow with longer reads

ADD REPLY
0
Entering edit mode

Not at my hand. Bowtie2 has some important features gem lacks.

ADD REPLY
0
Entering edit mode
11.8 years ago
kanwarjag ★ 1.2k

SHRIMP2 for miseq data has worked well you may have to tweak in parameters. depending on insert

ADD COMMENT

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6