Hi,
I have some Novaseq DNA sequencing data (100bp PE). I ran FASTQC on the raw FASTQ files. Then trimmed the adapters with trimmomatics and then used fastp to remove poly-G overrepresented sequence. After that, I ran FASTQC again. My first question is if this is a good trimming workflow or should I just stick to one trimmer?
Trimmomatic parameters are:
...ILLUMINACLIP:${adaptors}:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
and I followed the simple usage for fastp:
fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz
Q2) FASTQC showed that all of my R1 length are 10-100 while R3 is 100. is that normal? Also all R1 fails the per base sequence content
in FASTQC
where A% dropped at the end and trimming did not remove this...probably because the scores are good >= 35? Do I need to fix this? if so, what should I do?
R1 good score at base 96-100 but A% drop. R1 also failed per base sequence content
and Kmer content:
danish o
Q3) There are still ~2% (down from ~6 - 9%) of adaptors found in some samples after trimming. Do I need to remove them entirely even though they passed FASTQC? Are the parameters I used in Trimmomatic not strict enough?
This is all fine, you can proceed with your downstream processing. Don't bother yourself with these lowlevel metrics too much. You have basically no adapters and good base quality, that is mainly all that matters.
Just a side note question, do you (@ATpoint) know if the bias close to the end (3' end), in the Per base sequence content, is related with the library or sequencing technology?
I'm been observing this quite often. I'm just curious about. I know that 5 prime bias in RNA-seq is common and related to sequencing library, but the 3 prime bias I did not find any good documentation/blog explaining that. From the experiments that I'm working on I preferred to remove them.