Hi all!
I'm painfully inexperienced when it comes to coding. I know it's possible to do use cutadapt for trimming without separate lines of code but I'm not sure how. I have 24 paired end samples all with variations on the following names:
A1_S12_R1_001.fastq
A1_S12_R2_001.fastq
A3_S13_R1_001.fastq
A3_S13_R2_001.fastq
B1_S14_R1_001.fastq
B1_S14_R2_001.fastq
B3_S15_R1_001.fastq
B3_S15_R2_001.fastq
...
So I've got separate cutadapt lines to trim each:
cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o A1_S12_R1_001_trimmed.fastq -p A1_S12_R2_001_trimmed.fastq A1_S12_R1_001.fastq A1_S12_R2_001.fastq > A1_S12_cutadapt.txt
cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o A3_S13_R1_001_trimmed.fastq -p A3_S13_R2_001_trimmed.fastq A3_S13_R1_001.fastq A3_S13_R2_001.fastq > A3_S13_cutadapt.txt
I know there is a way to list my fastqs and drop the root of the file name into a loop command, something like this:
for i in $(ls *fastq | sed 's/_R[12]_001.fastq//' | sort -u); do cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o ${i}_R1_001_trimmed.fastq -p ${i}_R2_001_trimmed.fastq ${i}_R1_001.fastq ${i}_R2_001.fastq > ${i}_cutadapt.txt
Actually, I'd ideally run it using GNU Parallel but I know the syntax is slightly different. In fact, I've used something like this for non-paired end samples before, but don't know how to adapt it for paired end reads:
ls | time parallel -j+0 --eta 'fastx_clipper -a TGGAATTCTCGGG -c -v -i {} -o ../processing/{.}.clip'
Any suggestions or further reading would be appreciated. I'd love to understand these variables better.
This worked perfectly! Thanks Dave!
This script may give problems if number of files match, but with different names. Simple example below:
input files:
with OP code:
test2_R2_001.fastq is not a match for test_R1_001.fastq, yet picked up by the function. Function would run, but it is incorrect.
Following function is safer IMHO. It would only look for matching R1 and R2.
Now test2 sample is not picked up and function would fail as there is no test_R2_001.fastq.