Question

Cutadapt Trimming and Filtered smaller than expected number of reads

0

Entering edit mode

2.9 years ago

oldtownroald • 0

Hi,

I have been using cutadapt for the first time. on paired-end sequencing and I am getting a file that is much smaller than the 2 input sequence reads that I used. Why are so many of Read1 filtered out compared to Read 2? Is the error rate too low or is there is something wrong with the Read1 quality?

Here is the report:

This is cutadapt 3.5 with Python 3.9.7
Command line parameters: -j=1 -a 3P_ADAPTER=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -u -9 --info-file=/data/jwd/main/037/600/37600803/outputs/galaxy_dataset_8a8a5c4c-b70a-46cd-aea6-76bc7b36ee6b.dat --output=out1.fq.gz --paired-output=out2.fq.gz --error-rate=0.1 --times=1 --overlap=5 --action=trim --pair-filter=any Cutadapt on MR11_1 _Paired_ Read 1 Output.fq.gz Cutadapt on MR11_2_ Read 2 _Paired_ Output.fq.gz
Processing reads on 1 core in paired-end mode ...
Finished in 7761.40 s (24 µs/read; 2.48 M reads/minute).

=== Summary ===

Total read pairs processed:        321,319,110
  Read 1 with adapter:                 874,324 (0.3%)
Pairs written (passing filters):   321,319,110 (100.0%)

Total basepairs processed: 62,007,059,162 bp
  Read 1: 13,809,192,662 bp
  Read 2: 48,197,866,500 bp
Total written (filtered):  59,108,626,566 bp (95.3%)
  Read 1: 10,910,760,066 bp
  Read 2: 48,197,866,500 bp

galaxy cutadapt • 1.3k views

ADD COMMENT • link updated 2.8 years ago by stuart archer ▴ 140 • written 2.9 years ago by oldtownroald • 0

0

Entering edit mode

To get a bit of clarity on what my trimmer is actually doing, I made a little trimming visualization tool: https://github.com/MonashBioinformaticsPlatform/trimviz

It might help you get some clarity on what exactly is going on if you feed in before- and after- trimming fastqs into it.

ADD REPLY • link 2.8 years ago by stuart archer ▴ 140

score 1 · Answer 1 · 2021-12-24

Why are so many of Read1 filtered out compared to Read 2?

Reads are not getting filtered out. If you look closely the numbers above they are for base pairs that are survive trimming. Looks like your Read 1 is getting trimmed but Read 2 is not.

One does not need to have any adapter in your libraries. Only if the insert size is smaller than the length of sequencing (in case of Illumina) then you will find read through at 3'-end of reads with adapter sequence.

I am getting a file that is much smaller than the 2 input sequence reads that I used.

Don't use file sizes as a metric (other than in most coarse sense). Files can be compressed (more or less) depending on order of reads and how compressible the sequence is.