Hi All,
I was performing alignment of viral vector sample sequenced on MiSeq PE 250. I trimmed the raw reads using trim galore command as follows:
trim_galore -q 30 --length 50 -a AGATCGGAAGAGC --no_report_file --paired read1 read2 2> trim_galore.err
This should trim all bases with Phred Quality less than 30 and read length should be above 50 after trimming.
I aligned the files using Bowtie2 and performed variant analysis using Freebayes.
With freebayes I observed some spurious and some real variants (observed in previous project of the same kind).
516 CCCGGG>GGCGAC 1344X 27%
When I observe the alignment in IGV for spurious variants it shows some bases with quality less than 30. This is the alignment description from SAM file in which the false variant is observed.
M01610:114:000000000-C6GY3:1:1111:18615:7999 99 F8CDNAB-AAV 488 44 2S247M = 594 358 CTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCTGCGGCCTAAGCTTGGAACCATTGCCACCTTCAGGGGGAGGCTGCTGGTGAATATTAACCAAGATCACCCCAGTTACCGGAGGAGCAAACAGGGACTAAGTTCACACGCGTGGTACCG CD#&CFCCCCCCGGGGGGGGGGHHHG8(&!!!!!!!&8==!!!!!!!!!!!!!!!!!!$==8&!!!!!!$&(8GGHHHHHGHHGGGGGGGGGGGGGGAGGGGGGGAFGGGGGGGGFFFFFFFFFFFF<@FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFE:BFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFF.EFFFFFEFFFFFFFFFFFFFFFCADCFEFFFD AS:i:375 XS:i:194 XN:i:0 XM:i:17 XO:i:0 XG:i:0 NM:i:17 MD:Z:28C0C2G0G5C0C2G0G2T2G0G2A0C5G0T2C0C180 YS:i:500 YT:Z:CP ZQ:Z:@@b]@@@@@@@@@@@@@@@@@@@@@@O^afffffffaOKKgeggffffffffffffffbJKPbgggfffca_O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
As you can see some of the bases at the beginning of the read have a quality score of !
which is interpreted as Phred quality 0
I can filter these variants using freebayes -q 30
but they will still appear in IGV as these bases are present in BAM files.
Why does trim galore not remove these low quality bases when I have specified a min phred quality 30 in my trim_galore command. How can I make these trimming more stringent? How can obtain reads with all bases having quality greater than 30?
I wish to avoid downstream processing of BAM files and VCF files to remove these low quality calls and spurious variants.
Thanks!!