Entering edit mode
8.0 years ago
dr.dimo
•
0
Greetings and salutations, I am attempting to analyse some human Illumina WXS data and am getting some unfamiliar kmer over-representation. Usually post trimming (trimmomatic) I get quite good removal of kmers but with this latest data I can't seem to get rid of the 3-primer kmers. My question is in two parts.
1) Should I worry about this or just continue alignment/variant calling.
2) If I should worry about, how can I trim these successfully?
My trimmomatic parameters and pre/post trimming images are below, but I am probably missing something super obvious.
Thanks in advance.
trimmomatic-0.36.jar -phred33 1_ATGCCTAA_L001_R1.fastq.gz 1_ATGCCTAA_L001_R2.fastq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75
I have no experience in sequencing, but have seen this in our data due to the use of hexamers in the fragmentation or shearing step... You might not have to worry about this in variant-calling if you have good-enough coverage and are looking into major allele fractions
Great, appreciate the response.
Fragmentation bias tends to be on the left end of the read, rather than the right end.