I have multiple samples with R1 and R2 reads in fastq.gz format (these files are complementary to each other) I want to run BWA mem paired end parallel on all the files once finished each R1 and R2 complementary file should produce one sam file. Right now I am making two sam file from the two reads
This is what I have come up with but it’s not doing what I need it to do
for i in find -maxdepth 2 -iname *fastq.gz -type f
; do echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ${i}_R1_001.fastq.gz ${i}_R2_001.fastq.gz > ${i}_R1_R2.sam"; done
when it runs it looks like this
bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_001.fastq.gz ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R2_001.fastq.gz > ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_R2.sam
bwa mem -t 12 H.Sapiens/ucsc.hg19.fasta ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_001.fastq.gz ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R2_001.fastq.gz > ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_R2.sam -bash-4.1$ I understand the problem is in iname but how do I fixit? Thank you so much
Hey Dr. Lindenbaum,
I have the same issue as the original question. I tried your pipeline, and I now have my sam files for my samples. When I try to merge the sam files together using picard, I get an error
Cannot add sequence that already exists in SAMSequenceDictionary:
, The sam files I am trying to merge are 4 lanes of the same sample, and each lane is paired end. For example the following lanes have a R1 and R2 read.I ran BWA mem as you suggested, and I got Lane_01 - Lane_04 sam files. Since they are of the same initial sample, they should contain the same/very similar sequences. Picard won't allow me to merge these files because they share the same sequences. How do you recommend I advance from this issue?
I have 145 SAM files. How could I avoid to type the SAM file names as input to MergeSamFiles?
Google "file globbing".
and picard MergeSamFiles can read a list of files