Entering edit mode
4.2 years ago
AfinaM
▴
30
Hi everyone,
I am currently trying to analyse my ITS1 samples. AFAIK, ITS works differently as their region length is variable. Therefore, when I try to trim off the primer using bbduk
, I use the parameter below:
bbduk.sh in1=MSA_S21_L001_R1_001.fastq.gz in2=MSA_S21_L001_R2_001.fastq.gz \
out1=MSA_S21_L001_R1.fastq out2=MSA_S21_L001_R2.fastq \
ktrim=l k=22 mink=20 hdist=1 copyundefined=t ordered=t rcomp=t \
literal="GGAAGTAAAAGTCGTAACAAGG,GCTGCGTTCTTCATCGATGC" tpe tbo
I then use DADA2
and noticed that most of the input is filtered out and I think this is due to the primer trimming.
Has anyone else did the same thing as me?
What is the length of the sequences? What
filterAndTrim()
parameters are you using? Input is the number of reads after trimming the primers, correct? The filtering has nothing to do with bbduk trimming primers, your parameters are probably too stringent, or the reads are of bad quality.I set 0 for my
trim
andtrunc
parameter (I am usingQIIME2
btw) to get all the data in forDADA2
. Yes, input is the number of of reads after primer trimming. I am worried that because it was trimmed,DADA2
is recognizing some of the reads as bad quality. I tried again withmax_ee_r = 6
and it the number of filtered went up to 85811.Because it seems like there are too many reads filtered out, therefore I am trying to get as much reads into DADA2 as possible. Or is it normal for an ITS data? Even with the
rcomp=t
parameter inbbduk
, I don't see any difference.It seems like your data is moderately bad (or moderately good, if you are optimist). You have to evaluate the quality of the reads to set optimal
maxEE
,truncLen
, etc parameters. Did you examine the DADA2 quality profile plots, or FastQC quality plots? They will help you decide on the best parameters.Can I ask how do you determine that the data is 'moderately bad'? I did run FASTQC for both before and after trimming and it looks okay to me.
It is just a guess, based on the fact you are discarding ~43% of the reads at the
filterAndTrim()
step - my experience is, for good datasets, one discards ~5-20% of the reads. But it may not be related to quality, instead be related to thetruncLen
parameter.