Hi all,
I just had to analyze a really bad lane. It is an old Illumina GAIIx lane with reads of 152 cycles. The last 30-40 cycles are of really really bad quality on average so I decided to use the -q 20
switch in bwa aln
to trim reads 3'ends based on quality prior to mapping (something I usually would not do).
To have a look at what this trimming parameter left in the BAM file I drew the distribution of the length of soft-clipped part in the reads. To do so, I took the CIGAR string for 20 million alignments that were declared unique by BWA (XT:A:U
tag). Here follows what I got :
So we can see that there is a periodic pattern after the main pic (representing no soft-clipped bases). We can also notice the slightly higher bar at position 117. 152-117=35, indeed by default BWA won't trim the reads to something less than 35bp.
Have you already noticed such a pattern using bwa aln -q
and what in the algorithm produces this ?
Because looking at the -q
definition in the doc, I can not see any reason why the trimming would have such a periodic pattern.
Thanks guys.
T.
This I have noted before. It comes from the Illumina sequencing: The Meaning Of B In Illumina 1.5 Pipeline Data?
Thank you for this information.
awesome text plots! ;-) could you please add this as an answer as well?
Is it possible that the qualities in the file have some sort of periodicity?
I do not think so, but now that you raised the point, I will probably do a few checks on this :)
By looking at the link given in the comment below, it seems that you were right !