Hi all,
After performing adapter trimming with bbduk.sh
, I found that the total number of bases in the read1 file is different compared with the read2 file from FastQC quality check. Below was the code that I used for performing adapters trimming and the summary from FastQC:
bbduk.sh in1=$raw/$read1 in2=$raw/$read2 out1=$raw/trim/$(basename $raw)_trim_1.fastq.gz out2=$raw/trim/$(basename $raw)_trim_2.fastq.gz ref=$adapters_file ktrim=r hdist=1 k=23 mink=11 tpe tbo qtrim=r trimq=10 minlen=30
Read1 file (Total Bases) = 5.3 Gbp
Read2 file (Total Bases) = 5.1 Gbp
Read1 file (Total sequences) = 55633928
Read1 file (Total sequences) = 55633928
I thought that by including the flags tpe
and tbo
, the trimmed read1 and read2 files are supposed to have the same number of bases and sequences. However, in this case, the two files had the same number of sequences, but the total bases between the files were different. Does anyone know why that is the case? Can I leave it as it is for subsequent STAR mapping?
Brian Bushnell may comment on this but it may be the order of operations that is causing this. After the
tpe
is applied thetrimq
option may be causing some reads to get further shortened.Thanks GenoMax for the reply. That being said, does this difference in total bases between read1 and read2 have an effect in subsequent read mapping? I'm guessing that this impact is minimal, but not sure if this discrepancy leads to some error for subsequent processing algorithms like STAR.
No there should not be much impact unless one of the reads has become substantially short compared to its mate. Strictly speaking you don't need to trim the data if you are aligning to a good reference. Aligners like STAR and BBMap will automatically soft-clip parts of read that do not map.
The number of bases is still different even if
tpe
andtbo
are trailing after adapter and quality trimming. So far for the RNA-seq projects I've done, there doesn't seem to be an issue with using bbduk trimmed reads for mapping, so I'll probably leave it as it is. Thanks!Priority/order of operations in coded in the program. Brian will be the best person to comment on that. Simply moving options around on the command line will not change them.
That makes sense. Thanks for the reminder.