After error-correcting PacBio long reads using Illumina short-reads, what aligners are adept for aligning the corrected PacBio long reads against the genome?
Should I think of the corrected PacBio reads as just "long Illumina reads" (in terms of error & indel rate, etc.) ?
I am tempted to use BLASR, but the PacBio-specific error rates and indels are presumably "corrected out", so it still appropriate to use BLASR?
Is it appropriate to use BWA for the corrected PacBio long-reads?
Are some aligners more appropriate than others?
CMO, your best bet for working with hybrid data (short read + long read) is to use a hybrid aware package like one of the two options below:
The details of how hybrid data is correctly combined and then processed downstream is more complicated than most would expect. The long-and-short (pun intended) of it is that any tools written to process long-read data alone (BLASR) or short-read data alone( BWA etc.) are non-optimal.
Thank you, but I am more interested in how to align after the PacBio long reads are corrected. I am not necessarily interested in a de-novo assembly. And the correction step should be taken as given, I am not interested in correction methods.
Hi CMO,
I'm not a SME on the RS II so I ask Jason and he was kind enough to respond -- hope it was helpful...