Question

Low number of both surviving reads after trimming

0

Entering edit mode

4 weeks ago

Jay • 0

Hi,

Trimming using Trimmomatic gave me these results:

code

trimmomatic PE -phred33 R1.fastq.gz R2.fastq.gz R1.paired.fastq.gz R1.unpaired.fastq.gz R2.paired.fastq.gz R2.unpaired.fastq.gz ILLUMINACLIP:adapter.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:25

output

Input Read Pairs: 429726444 Both Surviving: 160577915 (37.37%) Forward
Only Surviving: 268947096 (62.59%) Reverse Only Surviving: 91710 (0.02%) Dropped: 109723 (0.03%)

There was low percentage of both surviving reads, so, ?I performed trimming on R1 and R2.fastq separately, and the percentage of surviving reads in R1 and R2 each was over 99.9%. How can I interpret this trimming result?

Additionally, despite a both surviving rate is approximately 38%, the actual number of surviving leads is approximately 160M. I guess this is enough to continue with other processes, am i right?

Thanks!

trimmomatic trimming • 502 views

ADD COMMENT • link 29 days ago by Jay • 0

0

Entering edit mode

Please edit your post and add the code parts again, this time use the 101010 code button and NOT the double quote button. The latter is used to cite a source verbatim and mangles monospaced code formatting.

code_formatting

ADD REPLY • link 4 weeks ago by Ram 43k

0

Entering edit mode

Thanks, I modified.

ADD REPLY • link 4 weeks ago by Jay • 0

0

Entering edit mode

I see that the code content was not added again, just the formatting modified. The output is still mangled. Compare it to the output you see on your screen and you'll notice that the post does not represent what you actually see.

ADD REPLY • link 4 weeks ago by Ram 43k

0

Entering edit mode

Do you have a plot of adapter contamination for this data from FastQC? If you have a lot of adapter dimers, short inserts i.e. bad quality library then this would be an expected result.

ADD REPLY • link 4 weeks ago by GenoMax 141k

0

Entering edit mode

Here is my fastqc, especially adapter content graph, before trimming(R1 and R2):

enter image description here

And this after trimming: enter image description here

ADD REPLY • link 4 weeks ago by Jay • 0

0

Entering edit mode

A significant fraction of your reads appear to have the nextera sequence, that combined to this directive LEADING:20 (do you have a reason to do this) are likely leading to very short reads which are likely being dropped once then hit minlen:25.

If you are willing to try bbduk.sh from BBTools suite. A guide is available here: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/

ADD REPLY • link 4 weeks ago by GenoMax 141k

0

Entering edit mode

Thank you for your response.

There was no particular reason of LEADING:20 and TRAILING:20, but I used the same parameters as I had used in previous projects which preferred to maintain high quality reads.

In fact, this data has many 'total sequences' because replicates are concatenated, so I wanted to extract highly reliable sequences and do downstream analysis. Therefore, even if both survival rates are approximately 38%, the actual number of surviving leads is approximately 160 million. What do you think about using the results after trimming to continue the rest of the process?

ADD REPLY • link 4 weeks ago by Jay • 0

0

Entering edit mode

It is your data and if you want to do that it is up to you. You may be throwing away essentially good data (you should remove the nextera sequences for sure). My guess if LEADING:20 may be coming from a particular pattern one sees in tagmentation/random priming-based sequence datasets.

Read more about that here (if you did not know): https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY • link 29 days ago by GenoMax 141k

0

Entering edit mode

Thank you for your opinion.

I will try with lower LEADING and TRAILING bases. By the way, I tried to set keepBothReads:true through search and got the following result:

Input Read Pairs: 429726444 Both Surviving: 428299887 (99.67%) Forward Only Surviving: 1225124 (0.29%) Reverse Only Surviving: 173243 (0.04%) Dropped: 28190 (0.01%)

There are explanations for those parameters, but I still don't understand them clearly. ?Could you please explain a little more?

ADD REPLY • link 29 days ago by Jay • 0