I need to precisely align NGS data. The problem is, in global alignment I loose plenty of reads (20%), while local alignment fails to align merely 5% of reads. I don't care that much about 3'-end, but 5'-end has to be aligned with 1bp precision.
Iterative hard-clipping of 3'-ends helps, but it would be perfect to do it in one run. Do you know of any program that can precisely align 5'-end and allows for soft-clipping of 3'-end at the same time?
bowtie2 does global or local alignment. in local mode, both ends may be clipped and I need entire 5'-end. Concerning BWA, indeed it does soft-clipping of 3-end based on quality down to 35bp ("-q INT quality threshold for read trimming down to 35bp [0]"), but I have noticed some reads are able to align after trimming to 25, 21 or even 16bp. I have never used MOSAIK, but will give it a try:)
BTW did you tried pre- trimming the reads to specific length and aligning it...Generally I follow pre- trimming the reads after looking at the FastQC report.
I did, but the point is some reads align uniqeuly @ 41bp, and other @ 31bp, and other @ 21bp. So I would need to do iterative 3'-trimming of unaligned reads. Novoalign does it for me automatically:)
I don't know an optimal solution for this, once I had a similar case where some reads need trimming with different sizes, I use Blat to align to the reference and then use some Perl scripts to convert the PSLX to a valid SAM format, inserting soft-masking ends and indels flags in the CIGAR when it was required. The mapping worked terrific but it was extremely slow. Do you want to take a look? http://github.com/caballero/RNAseq-Pi/ the files megablat.pl and pslx2sam.pl under bin/
thanks JC, I was thinking about BLAT. I think it could handle current data (4x6mln reads), but we expect a lot more in the following months, so I prefer to find efficient solution for that.
I see BWA does the soft clipping of the 3' end based on quality (-q option). and also MOSAIK v2.0 and Bowtie 2 supports soft clipping.
bowtie2 does global or local alignment. in local mode, both ends may be clipped and I need entire 5'-end. Concerning BWA, indeed it does soft-clipping of 3-end based on quality down to 35bp ("-q INT quality threshold for read trimming down to 35bp [0]"), but I have noticed some reads are able to align after trimming to 25, 21 or even 16bp. I have never used MOSAIK, but will give it a try:)
BTW did you tried pre- trimming the reads to specific length and aligning it...Generally I follow pre- trimming the reads after looking at the FastQC report.
http://wiki.bioinformatics.ucdavis.edu/index.php/Subsequence.pl
I did, but the point is some reads align uniqeuly @ 41bp, and other @ 31bp, and other @ 21bp. So I would need to do iterative 3'-trimming of unaligned reads. Novoalign does it for me automatically:)