Question

bowtie2; using for multiple fastq files, linux loop code

0

Entering edit mode

3.1 years ago

Farzaneh • 0

Hi, I'm pretty new to linux and ChipSeq analysis. At this point, I have 100 fastq.gz files to be aligned with hg19. I already indexed my genome and called it hg19 and could align my reads individually with it but I need to have a loop to work on all the 100 files at the same time.

Can please someone help me writing the correct code for it? I see in places people using for loop but I can't make it work for me. My fastq files are in: /mnt/d/Chipseq/Hchipseq This is the code I use.

for i in /mnt/d/Chipseq/Hchipseq/*.fastq
do
bowtie2 -p 16 --fast-local --no-mixed -t -x hg19 -U /mnt/d/Chipseq/Hchipseq/*.fastq S- i.sam
done

Thanks a lot for your help.

bowtie2 • 2.4k views

ADD COMMENT • link 3.1 years ago by Farzaneh • 0

score 2 · Accepted Answer · 2021-10-13

2

Entering edit mode

3.1 years ago

Istvan Albert 101k

You get the error is because you are not using the variable in your loop.

Instead of fixing that learn to use gnu parallel, no looping is necessary, and your code will actually run in parallel on each core:

ls -1 /mnt/d/Chipseq/Hchipseq/*.fastq  | parallel bowtie2 -p 16 --fast-local --no-mixed -t -x hg19 -U {} S- {.{.}.i.sam

see this post:

Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ADD COMMENT • link 3.1 years ago by Istvan Albert 101k

2

Entering edit mode

-S {.}.sam, not S- i.sam

As for the above for loop, it would be:

for i in /mnt/d/Chipseq/Hchipseq/*.fastq
  do
  bowtie2 -p 16 --fast-local --no-mixed -t -x hg19 -U "${i}" -S "${i%.fastq}".sam
  done

Pro tip, save disk spave by piping the output into samtools view or sort, e.g.

for i in /mnt/d/Chipseq/Hchipseq/*.fastq
  do
  bowtie2 -p 16 --fast-local --no-mixed -t -x hg19 -U "${i}" | samtools view -o "${i%.fastq}".bam
  done

The "${i}" in each iteration is one of the fastq files, and the "${i%.fastq}" strips the fastq suffix so you can append a new one such as .sam/.bam. Be sure to spend quality time on Unix basics. Even if you use stuff like workflow managers they are at some point all based on plain Unix, and proper knowledge of that is a good investment of time.

ADD REPLY • link 3.1 years ago by ATpoint 85k

1

Entering edit mode

correction added

ADD REPLY • link 3.1 years ago by Istvan Albert 101k

0

Entering edit mode

Thank you so much! I'm going through GNU parallel now and definitely will work on my basics as well.

Best, Farzaneh

ADD REPLY • link 3.1 years ago by Farzaneh • 0