Dear friends,
please help me to understand what I am doing wrong.
I'm doing my best to figure out how to use Linux for the purposes of NGS analysis. Now I am trying to use a for loop to iterate over fastq files named like this: LK_S2_L001_R1_001.clean.fq (constant part is bold). To test it, I just specified the directory containing only one pair of files corresponding to one sample.
for f in `sudo sh -c "ls /path/to/*fq" | rev | cut -c 22- | rev | sort -u`
do
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:${f}\tSM:${f}\tPL:ILLUMINA\tLB:${f}" ${f}_L001_R1_001.clean.fq ${f}_L001_R2_001.clean.fq | samtools sort -@ 2 -O BAM -o ${f}.bam
done
It generated a 188 668 kb BAM file, while the BAM for the same sample generated using
bwa mem -t 2 ucsc_hg19.fasta -R "@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tLB:sample" /path/to/sample_forward.fq /path/to/sample_reverse.fq | samtools sort -@ 2 -O BAM -o sample.bam
is 188 531 kb in size.
Here is a screenshot from IGV: They are different! Any ideas why is it so?
I don't think there is a good reason to use sudo here, avoid it as much as possible.
Thank you for your comment. I tried without
sudo
:It generated BAM of the same size as it was without
sudo
: 188 668 kb.So you don't need sudo, and you should only use sudo if you absolutely have to (get prompted to do so) and are absolutely sure about what you are doing.
But, you know, they still look different in IGV:
I need to use a more accurate way to compare them. Right now trying to find out the way...
Did you not read @Heng's answer I had linked below?
Even if they "look" different what are you trying to do with the data? Call SNP's? The end result of that analysis should not be different, even if the IGV view looks different. If you need the files to look identical then you need to use an aligner that supports deterministic mode.
bowtie2
is another aligner that I know which supports this mode.Yes, I've read this, thank you. I got it. I'm going to call variants - OK, let BAMs look different in IGV if variant calls are going to be the same... I just want to know if I can use the aformentioned for loop to process FASTQ files in pairs, and different sized BAMs make me worry.