How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
4
0
Entering edit mode
7.9 years ago
shark29 • 0

How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

bwa wgs bash shell • 9.6k views
ADD COMMENT
0
Entering edit mode

What have you tried ?

ADD REPLY
0
Entering edit mode

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

ADD REPLY
1
Entering edit mode

That would not be wise. R1/R2 is too generic. You would trip over this kind of file names once you start going beyond two files.

ADD REPLY
6
Entering edit mode
7.9 years ago

For some files:

a_1.fq.gz  a_2.fq.gz  b_1.fq.gz  b_2.fq.gz

Use glob for PE read1, and replace the suffix to get the sample name.

for f in  *_1.fq.gz; do
   r1=$f;
   r2=${f/_1.fq.gz/}._2.fq.gz;

   echo $r1 $r2; # do somethings
done

Result:

a_1.fq.gz a._2.fq.gz
b_1.fq.gz b._2.fq.gz
ADD COMMENT
3
Entering edit mode

Given your example, I find the following more intuitive and verbose:

for f in `ls *.fq.gz | cut -f1 -d'_'` #get constant part
do
stuff ${f}_1.fq.gz ${f}_2.fq.gz > ${f}.bam #whatever
done

Also easy setting of output name

ADD REPLY
0
Entering edit mode

that helped me. but i wanted to ask what exactly cut -f1 -d'_'` means. kindly explain -f1 and -d parameters.

ADD REPLY
3
Entering edit mode

Don't be so lazy. All of that is easy to find if you read the documentation of cut. Alternatively you can look at this website to explain shell code.

ADD REPLY
1
Entering edit mode

$ man cut

ADD REPLY
0
Entering edit mode

files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you

ADD REPLY
0
Entering edit mode

You don't need to. ${f} will expand to SRR1174825.

ADD REPLY
4
Entering edit mode
7.9 years ago

Depending on PE or SE reads, the loop is a bit more difficult.

For SE reads, assuming the following filenames:

sample1.fq.gz
sample2.fq.gz
sample3.fq.gz

You could use the following loop:

for fastq in *.fq.gz
do
bwa mem ref.fasta $fastq > ${fastq%.*}.bam stuff etc etc #your command for bwa
done

Or using gnu-parallel which is awesome: (directly piped into samtools for conversion to bam and sorting)

ls *.fq.gz | parallel -j 2 'bwa mem ref.fasta {} | samtools sort -o {.}.bam'
ADD COMMENT
0
Entering edit mode

Just jumping in here, I was given a for loop for using BWA very similar to what you have written above, but it doesn't work. I am not great with code, but if anyone on this thread has tips, I would love to hear them. When I run this loop I get a bunch of empty xxx.fastq.sam files and not the transition from xxx.fastq to xxx.sam as desired.

for fname in ./*.fastq;

do

bwa mem MPB_male $fname.fastq > $fname.sam;

done
ADD REPLY
0
Entering edit mode

Are you sure? have you inspected the files or are you just going off the extensions?

If its the latter, you should know that file extensions are just convention and don't actually have any meaning, it will simply be appending .sam to all of the filenames listed in your *.fastq glob

If you want the extension to just be .sam you can change the command to:

bwa mem ......  > "${fname%.*}".sam
ADD REPLY
0
Entering edit mode

Yes, using head xxx.sam shows nothing and the files show up as 0kb. Your patch does give me .sam files as desired, but they still have nothing in them. I can confirm that using bwa mem MPB_male xxx.fastq > xxx.sam gives me a working sam file. I just don't want to individually do this for 1000 samples!

ADD REPLY
0
Entering edit mode

Add echo in front of bwa in your loop (add quotes) and see if the printed commands look alright

ADD REPLY
1
Entering edit mode
7.9 years ago

It sounds like your main question is actually how to rename files...?

You do that like this:

mv SRR1174825_R1.fastq R1.fq
mv SRR1174825_R2.fastq R2.fq

Alternately, via symlinks:

ln -s SRR1174825_R1.fastq R1.fq
ln -s SRR1174825_R2.fastq R2.fq

The second version will not alter the original files, so I find it preferable.

This is not really a bioinformatics-related question, so it's better addressed on a site like stackoverflow. Also, I suggest you read about Linux and bash to get a basic understanding of how the command line works before trying to use bioinformatics tools. Once you have a basic understand of bash, you will realize that renaming is neither necessary nor productive in this case (in other words, that question is an X/Y problem).

ADD COMMENT
0
Entering edit mode

need to rename all coming .fastq files starting as SRR***_R1 and SRR***_R2 searching a replacement regular expression

ADD REPLY
1
Entering edit mode

why do you need to rename them?

ADD REPLY
0
Entering edit mode

rename the files as what?

ADD REPLY
0
Entering edit mode

Please rewrite your post with correct syntax and grammar. Also, if you want it to be answered, you need to be more explicit.

ADD REPLY
0
Entering edit mode
7.9 years ago
shark29 • 0

found solution for renaming http://stackoverflow.com/questions/7793950/regex-to-remove-all-text-before-a-character then put R1 and R2 as arguments

ADD COMMENT
0
Entering edit mode

That solves the question in the body of your original post.

If the question in the title has been answered by one of the answers then you should accept them to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6