I usually first filtered the fastq read first, tossing out those reads with average phred score< say, 20. But recently I realized long read (100bp) may have the problem of decreasing quality towards the end of read, even if the average pass the criterion. Just like below:
@HWI-ST150_0130:3:64:18989:54871#0/2
AGACTCCCGGGTAGCAAGTACCTGGGACCACAGGTTTGTGCGACCATGCCTGACTAATTTTTGTATTTTTAGTAGTGATGGGGTTTCACTAGGTTGGCGAG
+HWI-ST150_0130:3:64:18989:54871#0/2
ed\`dffffffbff^eeeabffffedafffcddbbe`\cea``cYddadbcdcYbRR][Yccc[_dddddP[ZMaBBBBBBBBBBBBBBBBBBBBBBBBBB
(here format is Illumina 1.5+) Just wondering is there any package to trim off these low-quality end? Also, after the trimming(say trim all bases encoded with "B", representing the lowest quality in Illumina 1.5+), each read will have variable length. Is this OK? What should I do to determined the edit distance parameter, which is dependent on the read length? (so calculate the average read length?)
Also, does BWA have such options?
THANKS
thx. learn a lot. Since this "B" has special meaning, do you think it's appropriate to use BWA aln -q option, suggested by Christ Penkett?
Not entirely sure, but I think this should do the trick as well.