Hi, I'm pretty new to linux and ChipSeq analysis. At this point, I have 100 fastq.gz files to be aligned with hg19. I already indexed my genome and called it hg19 and could align my reads individually with it but I need to have a loop to work on all the 100 files at the same time.
Can please someone help me writing the correct code for it? I see in places people using for loop but I can't make it work for me. My fastq files are in: /mnt/d/Chipseq/Hchipseq This is the code I use.
for i in /mnt/d/Chipseq/Hchipseq/*.fastq
do
bowtie2 -p 16 --fast-local --no-mixed -t -x hg19 -U /mnt/d/Chipseq/Hchipseq/*.fastq S- i.sam
done
Thanks a lot for your help.
-S {.}.sam
, notS- i.sam
As for the above for loop, it would be:
Pro tip, save disk spave by piping the output into
samtools view
orsort
, e.g.The
"${i}"
in each iteration is one of the fastq files, and the"${i%.fastq}"
strips the fastq suffix so you can append a new one such as.sam/.bam
. Be sure to spend quality time on Unix basics. Even if you use stuff like workflow managers they are at some point all based on plain Unix, and proper knowledge of that is a good investment of time.correction added
Thank you so much! I'm going through GNU parallel now and definitely will work on my basics as well.
Best, Farzaneh