Gnu Parallel To Perform Illumina Fastq Filter
1
4
Entering edit mode
13.1 years ago
jvijai ★ 1.2k

I love GNU parallel, except it is quite difficult "for me" to get the right syntax to do stuff.

Here is what I want to do. Filter a bunch of CASAVA-1.8 fastq.gz files using the FASTQ filter developed by Hannon lab at CSHL.

My typical command would be:

 gunzip -c SampleName_Barcode_Lane_R1_001.fastq.gz |  \
fastq_illumina_filter-Linux-x86_64 -vvN |gzip -9   \
>SampleName_Barcode_Lane_R1_001_filtered.fastq.gz

Now using GNU Parallel, here is what I thought would work

parallel gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz | \  
fastq_illumina_filter-Linux-x86_64 -vvN |gzip -9 \   
>SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz   \
::: {1..9}

But this doesnt work.
I have about 20 such files and I wanted to know how to do this. I also wanted to know how to deal with the naming convention of the file where it goes 001 to 020.

On a similar note, with the new CASAVA-1.8 fastq.gz files, what is the preferred method for mapping when you dont have a cluster. Merge the fastq and run BWA or run BWA on each and then merge?

Thanks

parallel fastq illumina • 5.3k views
ADD COMMENT
0
Entering edit mode

I assume your typical command is without 'parallel'

ADD REPLY
0
Entering edit mode

changed typical command to start without parallel. Thanks for pointing the error.

ADD REPLY
6
Entering edit mode
13.1 years ago
tange ▴ 190

You are very close. You only need to include UNIX quoting, otherwise the | and > will be interpreted by the shell. Read these sections: https://www.gnu.org/s/parallel/man.html#example__substitution_and_redirection https://www.gnu.org/s/parallel/man.html#quoting

Did you watch the intro videos: https://www.youtube.com/watch?v=OpaiGYxkSuQ https://www.youtube.com/watch?v=P40akGWJ_gY https://www.youtube.com/watch?v=1ntxT-47VPA https://www.youtube.com/watch?v=fOX1EyHkQwc

This should work:

parallel "gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz |  
fastq_illumina_filter-Linux-x86_64 -vvN |
gzip -9 >SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz" \
::: {1..9}

Or:

parallel gunzip -c SampleName_Barcode_Lane_R1_00{1}.fastq.gz \| \  
fastq_illumina_filter-Linux-x86_64 -vvN \|gzip -9 \   
\>SampleName_Barcode_Lane_R1_00{1}_filtered.fastq.gz   \
::: {1..9}

If you have a hard time getting the quoting right, you should consider making a small shell script that does the work for one file:

#!/bin/bash

VAR=$1
gunzip -c SampleName_Barcode_Lane_R1_00${VAR}.fastq.gz |  
fastq_illumina_filter-Linux-x86_64 -vvN |
gzip -9 >SampleName_Barcode_Lane_R1_00${VAR}_filtered.fastq.gz

And the run that script in parallel:

parallel my_script ::: {1..9}
ADD COMMENT
0
Entering edit mode

That worked wondefully! However, how do I make the substitution for files like 001- 020 using parallel. I had to do it in two steps, 1-9 and 11 through 20 separately .

ADD REPLY
0
Entering edit mode

@jvijai: just use {1..20}

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6