I am using usegalaxy.org to work with paired end RNAseq data. I am using Cutadapt to trim adapter sequences, and the Cutadapt output files are larger than the files I am inputting. Example, my first sample SRR6467550, the forward read input fastsanger.qz is 2.1 GB. After using Cutadapt, the output fastsanger.qz is 8.1 GB. This is causing my disk quota to fill much faster and making it difficult to work with the amount of data I have (226 samples, I am going to have to work in batches as is). Is this problem avoidable in any way? Is there a way to obtain an output that is smaller?
My full input for reference:
Paired-end collection: My Data
Read 1 (3'): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Read 2 (3'): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Minimum length (R1): 20
Quality cutoff: 20
Outputs Selector: Report: Cutadapt's per-adapter statistics. You can use this file with MultiQC.
Only way output files will be larger than inputs is if the input files were gzip compressed while the output files are not.
The Galaxy community has a dedicated help channel:
https://help.galaxyproject.org/
As GenoMax pointed out, maybe there is a "compressed output" option to select somewhere?