Hi, from my understanding the adapter sequences can only occur on the 3' side of the sequencing reads if there is a read through. Does this mean that when you open a Fastq file (either R1 or R2) to inspect, you should see the majority of them on the right side of the file? In my Fastq files the majority of the adapter sequences that I see are on the left side and they correspond to exactly half of the sequence followed by 6,7 "A"'s, and the rest of the sequence seem to have a lower quality based on the quality values. The trimmomatic program seem to be removing them but I am curious about their source. Thanks for your help.
Thank you, these are WGS samples, and the overrepresented sequence is identified as "TruSeq Adapter, Index 18" by Fastq which reports 101,779 of them (i.e. 0.16%) in the file. From rough visual inspection, 90% of these have the adapter sequence in the beginning (i.e. left side) of the sequence. Trimmomatic removes the entire 151 bases of these sequences.
Are your reads reverse complemented by any chance? My hunch is they are, if 90% of reads have adapters at the beginning. I suggest you reverse complement them (use
reformat.sh
from BBMap suite) and check.You can also use
bbduk.sh
withktrim=l
option with the data as is. This should only remove that part of the read which is adapter on the left end of the read. A guide forbbduk.sh
is available here.Thanks, these are good points. I will look into the scripts you mentioned. We have received these files from collaborators so may need to check with them, but in general is it recommended to check if the reads are reverse complemented for any fastq file received from sequencing centres? and if they are, would the current aligners (e.g bwa-mem) know how to handle them?
Aligners should handle the reads fine. You should not need to even trim them, if you are aligning to a reference.
Not sure why they would have been released as RC. Sequencing centers generally do not do that though
bcl2fastq
(Illumina's software for pre-processing of data) does have the option to generate RC version of reads.Great, thanks again for good points!