Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk
0
0
Entering edit mode
12 months ago
CTLong ▴ 120

Hi all,

After performing adapter trimming with bbduk.sh, I found that the total number of bases in the read1 file is different compared with the read2 file from FastQC quality check. Below was the code that I used for performing adapters trimming and the summary from FastQC:

bbduk.sh in1=$raw/$read1 in2=$raw/$read2 out1=$raw/trim/$(basename $raw)_trim_1.fastq.gz out2=$raw/trim/$(basename $raw)_trim_2.fastq.gz ref=$adapters_file ktrim=r hdist=1 k=23 mink=11 tpe tbo qtrim=r trimq=10 minlen=30

Read1 file (Total Bases) = 5.3 Gbp
Read2 file (Total Bases) = 5.1 Gbp
Read1 file (Total sequences) = 55633928
Read1 file (Total sequences) = 55633928

I thought that by including the flags tpe and tbo, the trimmed read1 and read2 files are supposed to have the same number of bases and sequences. However, in this case, the two files had the same number of sequences, but the total bases between the files were different. Does anyone know why that is the case? Can I leave it as it is for subsequent STAR mapping?

bbduk • 868 views
ADD COMMENT
1
Entering edit mode

Brian Bushnell may comment on this but it may be the order of operations that is causing this. After the tpe is applied the trimq option may be causing some reads to get further shortened.

ADD REPLY
0
Entering edit mode

Thanks GenoMax for the reply. That being said, does this difference in total bases between read1 and read2 have an effect in subsequent read mapping? I'm guessing that this impact is minimal, but not sure if this discrepancy leads to some error for subsequent processing algorithms like STAR.

ADD REPLY
0
Entering edit mode

No there should not be much impact unless one of the reads has become substantially short compared to its mate. Strictly speaking you don't need to trim the data if you are aligning to a good reference. Aligners like STAR and BBMap will automatically soft-clip parts of read that do not map.

ADD REPLY
0
Entering edit mode

The number of bases is still different even if tpe and tbo are trailing after adapter and quality trimming. So far for the RNA-seq projects I've done, there doesn't seem to be an issue with using bbduk trimmed reads for mapping, so I'll probably leave it as it is. Thanks!

ADD REPLY
0
Entering edit mode

even if tpe and tbo are trailing after adapter and quality trimming

Priority/order of operations in coded in the program. Brian will be the best person to comment on that. Simply moving options around on the command line will not change them.

ADD REPLY
0
Entering edit mode

That makes sense. Thanks for the reminder.

ADD REPLY

Login before adding your answer.

Traffic: 2582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6