How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you
How to run BWA or the other aligner for paired .fastq in a bash loop and pipeline?
files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you
For some files:
a_1.fq.gz a_2.fq.gz b_1.fq.gz b_2.fq.gz
Use glob for PE read1, and replace the suffix to get the sample name.
for f in *_1.fq.gz; do
r1=$f;
r2=${f/_1.fq.gz/}._2.fq.gz;
echo $r1 $r2; # do somethings
done
Result:
a_1.fq.gz a._2.fq.gz
b_1.fq.gz b._2.fq.gz
Don't be so lazy. All of that is easy to find if you read the documentation of cut
. Alternatively you can look at this website to explain shell code.
Depending on PE or SE reads, the loop is a bit more difficult.
For SE reads, assuming the following filenames:
sample1.fq.gz
sample2.fq.gz
sample3.fq.gz
You could use the following loop:
for fastq in *.fq.gz
do
bwa mem ref.fasta $fastq > ${fastq%.*}.bam stuff etc etc #your command for bwa
done
Or using gnu-parallel which is awesome: (directly piped into samtools for conversion to bam and sorting)
ls *.fq.gz | parallel -j 2 'bwa mem ref.fasta {} | samtools sort -o {.}.bam'
Just jumping in here, I was given a for loop for using BWA very similar to what you have written above, but it doesn't work. I am not great with code, but if anyone on this thread has tips, I would love to hear them. When I run this loop I get a bunch of empty xxx.fastq.sam files and not the transition from xxx.fastq to xxx.sam as desired.
for fname in ./*.fastq;
do
bwa mem MPB_male $fname.fastq > $fname.sam;
done
Are you sure? have you inspected the files or are you just going off the extensions?
If its the latter, you should know that file extensions are just convention and don't actually have any meaning, it will simply be appending .sam
to all of the filenames listed in your *.fastq
glob
If you want the extension to just be .sam
you can change the command to:
bwa mem ...... > "${fname%.*}".sam
Yes, using head xxx.sam shows nothing and the files show up as 0kb. Your patch does give me .sam files as desired, but they still have nothing in them. I can confirm that using bwa mem MPB_male xxx.fastq > xxx.sam gives me a working sam file. I just don't want to individually do this for 1000 samples!
It sounds like your main question is actually how to rename files...?
You do that like this:
mv SRR1174825_R1.fastq R1.fq
mv SRR1174825_R2.fastq R2.fq
Alternately, via symlinks:
ln -s SRR1174825_R1.fastq R1.fq
ln -s SRR1174825_R2.fastq R2.fq
The second version will not alter the original files, so I find it preferable.
This is not really a bioinformatics-related question, so it's better addressed on a site like stackoverflow. Also, I suggest you read about Linux and bash to get a basic understanding of how the command line works before trying to use bioinformatics tools. Once you have a basic understand of bash, you will realize that renaming is neither necessary nor productive in this case (in other words, that question is an X/Y problem).
found solution for renaming http://stackoverflow.com/questions/7793950/regex-to-remove-all-text-before-a-character then put R1 and R2 as arguments
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What have you tried ?
files look like SRR1174825_R1.fastq and SRR1174825_R2.fastq How to rename them just to R1 and R2? thank you
That would not be wise. R1/R2 is too generic. You would trip over this kind of file names once you start going beyond two files.