Hi Guys,
I am evaluating quality of multiple lane paired end FASTQ file using FastQC tool. It is showing Adapter content graph as Illumina Universal Adapter (screenshot attached). But in the read sequence itself (150 length), I could not find any adapter sequence.
What should I do in that case? I will highly appreciate your suggestions.
Hi Guys,
Thank you so much for the nice answers. I attempted to cut adapters via cutadapt using command below
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o cut_Trueseq.fastq.gz input_L001_R1_001.fastq.gz
The fastq was successfully generated and evaluated using FASTQC tool again. I noticed that the reads before trimming were having sequence length of 151 but after trimming length varied from read to read based on position at which adapter was find and trimmed. This is reflected from the graph of sequence length distribution (trim.fastq; attached) which is no longer same as the original graph (original.fastq; attached). Therefore, I am wondering about the quality of my fastq file as FASTQC is giving red X sign on sequence length distribution.
FastQC Sequence Length Distribution module is set to assume all your sequences should have the same length. You can ignore the red X since you know you trimmed the sequences and therefore your sequence length is no longer uniform.
Agreed. The fastqc now looks perfectly fine and you can proceed with downstram analysis.
Hi Guys,
Thank you so much for the nice suggestions.
I also noticed that in some of my FASTQ files, FASTQC is showing presence of overrepresented sequences. How to handle this, whether they need to be removed ? If yes, how can I remove these sequences from my FASTQ files.
Thank You!!
Poly-G reads represent cluster producing
no signal
in two-color chemistry Illumina sequencers like NovaSeq,/NextSeq. This reads can be removed at the time of trimming. e.g.trimpolyg=0
withbbduk.sh
.