shell script for bowtie/bwa alignment pair end reads
3
3
Entering edit mode
9.9 years ago
Paul ★ 1.5k

Dear all,

I would like to write shell script to automatic alignment my pair end data with bowtie (or bwa) aligners.

Problem is, when I have lot of pair-end fastq files in one directory I have problem to loading data in for cycle.

Here is my code:

for i in $(ls *.fastq | rev | cut -c 22- | rev | uniq)
    do ./bowtie2 --very-sensitive -p32 --rg-id ${i} --rg-sample ${i} --rg-library Illumina --rg-platform Illuimna -x $reference -1 ${i}_L001_R1_001.fatsq.gz -2 ${i}_L001_R2_001.fatsq.gz -S ${i%.fastq.gz}${i}.sam
    done;

This is very nice solution help solved by RamRS. Is there any other way how to treat Bowtie or BWA with many samples (corresponding one sample to each read) in one for cycle?

my directory contains lot of fastq.gz:

name1_R1_001.fatsq.gz
name2_R2_001.fatsq.gz
...
...
...

Each sample name (prefix before R._001.fastq.gz) has different name length so algorithm above doesn't work properly :-(

Thank you for any idea and solutions..

fastq shell bwa bowtie illumina • 17k views
ADD COMMENT
1
Entering edit mode

Hi Paul,

Not sure if you have this sorted but I managed to manipulate RamRS's bash to run for Bowtie2

#!bin/bash
for I in $(ls *.fastq | rev | cut -c 14- | rev | uniq)
do 
  bowtie2 --very-sensitive -p16 --rg-id ${i} -x cprefseqs -1 ${i}_R1_001.fastq -2 ${i}_R2_001.fastq  -S $i.sam
done

Not sure if this is exactly what you want but it gives the output I'm after. I'm by no means an expert in the btw, it just worked for me. The format of my input files are like this: 65_S155_L001_R1_001.fastq

Cheers,
Todd

ADD REPLY
0
Entering edit mode

I assume you mean "one sample per file/pair of files" rather than "one sample to each read".

What have you tried other than that solution from RamRS and do you understand what it's doing? Once you know how that solution works, you should be able to do this without any help.

ADD REPLY
0
Entering edit mode

I don't understand very well what is exactly your question. Are you trying to run BWA/Bowtie for each pair of samples (assuming PE) in a folder? So, the problem is that you are not able to get the files two by two?

ADD REPLY
0
Entering edit mode

Presumably you meant to reply comment on the original post, not my comment :) I've moved it accordingly.

ADD REPLY
0
Entering edit mode

@airan: This might provide some context: bash loop for alignment RNA-seq data

ADD REPLY
0
Entering edit mode

Hi, this script depends not on the prefix length but only on the length of the R[12]_001.fastq part, which is constant (13). What is the problem you face, exactly?

Also, ${MY_VARIABLE_NAME} is the syntax to encapsulate a variable name; you're looking for the $(MY_COMMAND_OR_PIPE) syntax, to execute a command or pipe.

ADD REPLY
3
Entering edit mode
9.9 years ago
Zaag ▴ 870
for I in `ls *.fastq | cut -d "_" -f 1` ;
    do ./bowtie2 --very-sensitive -p 32 --rg-id $i --rg-sample $i --rg-library Illumina --rg-platform Illuimna -x $reference -1 $i\_L001_R1_001.fastq.gz -2 $i_L001_R2_001.fastq.gz -S $i.sam ;
done ;

edit: removed useless grep

ADD COMMENT
1
Entering edit mode

There's no reason to grep the output of ls. Just ls *.fastq.

ADD REPLY
1
Entering edit mode

OP has made a copy-paste mistake. My original script here has $(COMMAND), not ${COMMAND}. $(COMMAND) and `COMMAND` are equivalent. $(COMMAND) IMO avoids quoting confusion.

ADD REPLY
0
Entering edit mode
7.6 years ago

Hello, I tried the above-indicated code after modification according to my sequences, I always get an error. bowtie Error!

Error: Encountered internal Bowtie 2 exception (#1)
(ERR): bowtie2-align exited with value 1

the code is like that:

for i in $ls *.fastq ; do bowtie2 --very-sensitive-local -p 16 --rg-id ${i} --rg sample ${i} -x $index/*.bt2 -1 ${i}_*_1P.fastq -2 ${i}_*_2P.fastq -U ${i}_*_U.fastq -S ${i}-H2-7.sam ; done

my sequence are already trimmed and have a uniq prefix and ending like

NG-9974_*_**1**_1P.fastq
NG-9974_*_**1**_2P.fastq
NG-9974_*_**1**_1U.fastq  # for trimmed unpaired
NG-9974_*_**1**_2U.fastq # for trimmed unpaired

it is also like that (5 or 6, 8 instead of 1 as this)

NG-9974_*_**5**_1P.fastq
NG-9974_*_**5**_2P.fastq
NG-9974_*_**5**_1U.fastq  # for trimmed unpaired
NG-9974_*_**5**_2U.fastq # for trimmed unpaired

I also tried to rev and cut the sequence name as it was suggested, it always gave me the indicated error, could somebody help please to improve or correct my code.

Thank you in advance, Emad

ADD COMMENT
2
Entering edit mode

Try the same loop but add echo in front of the command, to check how it looks like without running the actual alignment. As such you can spot errors.

ADD REPLY
0
Entering edit mode

Thanks Wouter and Ram for your reply, it looks like the bowtie loop does not take the right paired sequence NG-9974__1_1P.fastq and NG-9974__1_2P.fastq as -1 and -2 paired sequences, and the unpaired sequences NG-9974__1_1U.fastq and NG-9974__1_2U.fastq as unpaired under -U arguments rather than take the individual sequence like NG-9974__1_1P.fastq and add the suffixes __1P.fastq, __2P.fastq, and __1U.fastq after the -1, -2 and -U arguments like this:

-1 NG-9974__1_1P.fastq__1P.fastq -2 NG-9974__1_1P.fastq__2P.fastq -U NG-9974__1_1P.fastq__1U.fastq

and

-1 NG-9974__1_2P.fastq__1P.fastq -2 NG-9974__1_2P.fastq__2P.fastq -U NG-9974__1_2P.fastq__1U.fastq

and

-1 NG-9974__2_1U.fastq__1P.fastq -2 NG-9974__2_1U.fastq__2P.fastq -U NG-9974__2_1U.fastq__1U.fastq

I could solve this problem by cutting the last 10 character of the sequences, but the problem still exists with bowtie error, although the alignment pairs and single unpaired looks good now!

ADD REPLY
0
Entering edit mode

Is there a verbose option to bowtie? Does it write a log?

ADD REPLY
0
Entering edit mode

bowtie alignment itself does have a verbose option, it has just in bowtie-build and bowtie-inspector options. Anyway, I could dig out the problem and it looks that the $ sign in front of the index argument made all the problems. It works now, thank you again! :)

ADD REPLY
2
Entering edit mode

Your bash loop looks for a ${i}_*_U.fastq file, but you have ${i}_*_1U.fastq and ${i}_*_2U.fastq files.

ADD REPLY

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6