Dear all,
I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an "overrepresented sequence" of an "illumina index adapter" in them.
I have the following questions:
Does sample1 look like a trimmed file or it requires adapter trimming?
If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]
Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?
My thoughts on question 2: The sequences for illumina index adapter format appear to be:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
These are the adapter sequences found in my FastQC report for sample 1:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATGC
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATG
I am thinking of using below options for cutadapt/trimgalore to remove the adapter(s):
trim_galore sample1.fastq.gz -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -q 20 --length 20 –fastqc
However, it seems that trimmomatics for instance only takes care of the initial sequence of the index adapter (only up to Ns and not after): https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-SE.fa
Many thanks for your time and reply beforehand.
For your reference most trimming programs should trim all sequence to the right when they find the core sequence that is common to the adapters. Finding the core sequence indicates that one ran out of insert and hit the adapter on 3'-end (i.e you have an insert shorter than the length of sequencing).
Thanks a lot for setting me in the right direction. So since the illumina universal adapter sequence ("AGATCGGAAGAGC") is already included on the "left end" (5'end) of the indexing adapter sequence, trimming that sequence alone will also remove the entire index adapter sequence to its right. Which would also be the reason why trimmomatics uses only the "left end" of the index adapter sequence for trimming. I tried trimming using universal sequence and it appears to have removed the index adapter sequences accordingly.
Also based on your reply I found below which was helpful: https://support.illumina.com/bulletins/2016/04/adapter-trimming-why-are-adapter-sequences-trimmed-from-only-the--ends-of-reads.html
Best