I am trying to trim adapters from raw Novaseq sequences. Here is the FastQC from raw seqs:
Raw sequences checked by FastQC. Reverse only is shown.
Problems
I need to trim adapters. Does the rest of the report offer any clues as to what's wrong with the run? Is it overall bad?
Sequencing facility told me the following
Platform: TrueSeq
Kit: Swift Accel-NGS 2s DNA library prep kit
Adapters
- P5: 5' AATGATACGGCGACCACCGAGATCTACAC[i5index]ACACTCTTTCCCTACACGACGCTCTTCCGATCT
P7: 5' CAAGCAGAAGACGGCATACGAGAT[i7index]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
My understanding is the i5 and i7 indices are hexamers i.e. NNNNNN
I used AdapterRemoval with two approaches (neither worked well)
Used facility-supplied adapters inserting NNNNNN hexamer for the spacers
AdapterRemoval --threads 40 --file1 sample_R1.fastq.gz --file2 sample_R2.fastq.gz --adapter1 AATGATACGGCGACCACCGAGATCTACACNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT --adapter2 CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
After trimming using facility-supplied P5 and P7. Reverse only is shown.
Used
identify_adapters
inAdapterRemoval
, which gave me P5 that partially matched what the facility said (above) but totally different P7 sequences. Using the automatically detected adapters, I also tried trimming as follows.AdapterRemoval --threads 40 --file1 sample_R1.fastq.gz --file2 sample_R2.fastq.gz --adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG --adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
After trimming using automatically detected adapters. Reverse only is shown.
What am I doing wrong? What should I be doing?
Use the instructions here to add images: How to add images to a Biostars post
I suggest that you try
bbduk.sh
(GUIDE) for adapter removal. There is a core sequence common for all adapters (before the index). So as long as you find that and trim everything to the right of that sequence you will remove adapter sequences.Thanks, genomax! Are the images not showing up? I just clicked the add image button and linked these from my public Google Drive. I will read the instructions.
I can only see the images of the report categories that are in left hand column of fastqc report.
oh, that was the intention. I wanted to limit post size (i.e. prevent overwhelming).
Without seeing the actual plots we can't really help you. Having a red "X" in fastqc only means that the value is out of bounds of an interval (the defaults are set for genomic sequencing). These "failures" have to be taken into context of the type of data one is analyzing.
Raw sequences before trimming. This analysis was run on a concatenated set of all reverse seqs, that's why there are lots of them in Basic Statistics.
Trimmed sequences using facility-supplied adapters inserting NNNNNN hexamer for the spacer (first approach in OP). This analysis was run on just one sample of reverse seqs, that's why there are fewer seqs in Basic Statistics.
Hi GenoMax, For adapter removal and filtering, is it not necessary to provide the whole adapter sequence, but only the core sequence? Are the below sequences the core sequence common for all adapters that you are referring to? They are described by Illumina for their TruSeq kits and also found in the adapter file of Trimmomatic. Thanks.
Read 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA Read 2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Providing the core sequence is adequate. Once trimming programs match this sequence they will remove all sequence to the 3'-side of that match.
Thanks for confirming. Is it necessary to provide the reverse complement of the core adapter sequence as well?