Hi everyone,
I have 2 questions:
1) I have found this script online to run Kraken2 in a loop on paired ends. Although I know it works well, because I have compared the results with another loop I have, I am not really understanding what is doing.
for FILE in $(ls *_R1.fastq | sed 's/_R1.fastq//'); do kraken2 --db path/plant_db --memory-mapping --threads 8 --use-mpa-style --confidence 0.1 --report path/${FILE}_report.txt --paired ${FILE}_R1.fastq ${FILE}_R2.fastq --report-zero-counts --output path/${FILE}_taxa.txt; done
Can someone please explain to me how to read 's/_R1.fastq//' ? How does it know to use the _R2.fastq files too?
2) I have also written this script using GNU parallel which works well for single paired ends but I am not sure how to modify it if I want to use paired ends. Can someone help me out, please?
time parallel -j2 "kraken2 {} --threads 2 --db path/plant_db --gzip-compressed --confidence 0.1 --report {}report.txt --report-zero-counts --output {}taxa.txt" ::: path/*.fastq
Thanks a lot!
Shorter:
parallel --dry-run --rpl '{R} s/_R1.fastq//' 'kraken2 --db path/plant_db --memory-mapping --threads 8 --use-mpa-style --confidence 0.1 --report path/{R}_report.txt --paired {} {R}_R2.fastq --report-zero-counts --output path/{R}_taxa.txt' ::: *_R1.fastq