Hi,
I have been using cutadapt for the first time. on paired-end sequencing and I am getting a file that is much smaller than the 2 input sequence reads that I used. Why are so many of Read1 filtered out compared to Read 2? Is the error rate too low or is there is something wrong with the Read1 quality?
Here is the report:
This is cutadapt 3.5 with Python 3.9.7
Command line parameters: -j=1 -a 3P_ADAPTER=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -u -9 --info-file=/data/jwd/main/037/600/37600803/outputs/galaxy_dataset_8a8a5c4c-b70a-46cd-aea6-76bc7b36ee6b.dat --output=out1.fq.gz --paired-output=out2.fq.gz --error-rate=0.1 --times=1 --overlap=5 --action=trim --pair-filter=any Cutadapt on MR11_1 _Paired_ Read 1 Output.fq.gz Cutadapt on MR11_2_ Read 2 _Paired_ Output.fq.gz
Processing reads on 1 core in paired-end mode ...
Finished in 7761.40 s (24 µs/read; 2.48 M reads/minute).
=== Summary ===
Total read pairs processed: 321,319,110
Read 1 with adapter: 874,324 (0.3%)
Pairs written (passing filters): 321,319,110 (100.0%)
Total basepairs processed: 62,007,059,162 bp
Read 1: 13,809,192,662 bp
Read 2: 48,197,866,500 bp
Total written (filtered): 59,108,626,566 bp (95.3%)
Read 1: 10,910,760,066 bp
Read 2: 48,197,866,500 bp
To get a bit of clarity on what my trimmer is actually doing, I made a little trimming visualization tool: https://github.com/MonashBioinformaticsPlatform/trimviz
It might help you get some clarity on what exactly is going on if you feed in before- and after- trimming fastqs into it.