Question

Gnu Parallel To Perform Illumina Fastq Filter

4

Entering edit mode

13.5 years ago

jvijai ★ 1.2k

I love GNU parallel, except it is quite difficult "for me" to get the right syntax to do stuff.

Here is what I want to do. Filter a bunch of CASAVA-1.8 fastq.gz files using the FASTQ filter developed by Hannon lab at CSHL.

My typical command would be:

 gunzip -c SampleName_Barcode_Lane_R1_001.fastq.gz |  \
fastq_illumina_filter-Linux-x86_64 -vvN |gzip -9   \
>SampleName_Barcode_Lane_R1_001_filtered.fastq.gz

Now using GNU Parallel, here is what I thought would work

parallel gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz | \  
fastq_illumina_filter-Linux-x86_64 -vvN |gzip -9 \   
>SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz   \
::: {1..9}

But this doesnt work.
I have about 20 such files and I wanted to know how to do this. I also wanted to know how to deal with the naming convention of the file where it goes 001 to 020.

On a similar note, with the new CASAVA-1.8 fastq.gz files, what is the preferred method for mapping when you dont have a cluster. Merge the fastq and run BWA or run BWA on each and then merge?

Thanks

parallel fastq illumina • 5.6k views

ADD COMMENT • link updated 13.5 years ago by tange ▴ 190 • written 13.5 years ago by jvijai ★ 1.2k

0

Entering edit mode

I assume your typical command is without 'parallel'

ADD REPLY • link 13.5 years ago by tange ▴ 190

0

Entering edit mode

changed typical command to start without parallel. Thanks for pointing the error.

ADD REPLY • link 13.5 years ago by jvijai ★ 1.2k

Ram · Answer 1 · 2011-11-08

6

Entering edit mode

13.5 years ago

tange ▴ 190

You are very close. You only need to include UNIX quoting, otherwise the | and > will be interpreted by the shell. Read these sections: https://www.gnu.org/s/parallel/man.html#example__substitution_and_redirection https://www.gnu.org/s/parallel/man.html#quoting

Did you watch the intro videos: https://www.youtube.com/watch?v=OpaiGYxkSuQ https://www.youtube.com/watch?v=P40akGWJ_gY https://www.youtube.com/watch?v=1ntxT-47VPA https://www.youtube.com/watch?v=fOX1EyHkQwc

This should work:

parallel "gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz |  
fastq_illumina_filter-Linux-x86_64 -vvN |
gzip -9 >SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz" \
::: {1..9}

Or:

parallel gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz \| \  
fastq_illumina_filter-Linux-x86_64 -vvN \|gzip -9 \   
\>SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz   \
::: {1..9}

If you have a hard time getting the quoting right, you should consider making a small shell script that does the work for one file:

#!/bin/bash

VAR=$1
gunzip -c SampleName_Barcode_Lane_R1_00${VAR}.fastq.gz |  
fastq_illumina_filter-Linux-x86_64 -vvN |
gzip -9 >SampleName_Barcode_Lane_R1_00${VAR}_filtered.fastq.gz

And the run that script in parallel:

parallel my_script ::: {1..9}

ADD COMMENT • link 13.5 years ago by tange ▴ 190

0

Entering edit mode

That worked wondefully! However, how do I make the substitution for files like 001- 020 using parallel. I had to do it in two steps, 1-9 and 11 through 20 separately .

ADD REPLY • link 13.5 years ago by jvijai ★ 1.2k

0

Entering edit mode

@jvijai: just use {1..20}

ADD REPLY • link 13.5 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

See https://www.gnu.org/software/parallel/man.html#example__context_replace on how to do prepended zeros.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 13.5 years ago by tange ▴ 190