What are the best parameters for aligning PacBio reads with bwasw (or any other aligner)? Since PacBio has a significantly higher error rate dominated by indels, I aligned with a larger number of allowed gaps and a lower gap open penalty. While initially it appeared I got a good number of alignments on looking at it deeper I found that in almost all cases only a small fraction of the read had been aligned (say 10-20 bases in a 2 kb long read).
Also note that due to circularization of dna fragments you do not expect the whole read to align but only a portion of it (subread). Another good reason to use a specialized aligner for pacbio reads.
You can download blasr from github: https://github.com/PacificBiosciences/blasr . You need hdf5 installed to compile. Use default alignment parameters, but add the flag "-sam" to produce output in sam format if that is desired.
You might start looking at their software. They have aligning software specifically for PacBio data called BLASR. From there though, I am not sure what the best parameters are. I'm sure it varies from run to run and between different organisms.
Post error correction, you can use gmap/gsnap or bbmap for aligning PacBio reads. Please note you need fast files as input and your output will be in sam format.
That's true - the upper limit is currently 6kbp; but it will break longer reads into 6kbp pieces and map those. The majority of current PacBio reads that I've seen are still under 6kbp. Also, BBMap handles the raw reads fine; they don't need correction first.
The latest chemistry P6-C4 produces reads averaged >10kb reportedly. With experimental size selection, even P5-C3 reads are around 10kb in length. ~10kb pacbio reads are common these days.
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.0 years ago by
lh3
33k
Also note that due to circularization of dna fragments you do not expect the whole read to align but only a portion of it (subread). Another good reason to use a specialized aligner for pacbio reads.