Heys, I'm working with two genomes from a whole-genome sequencing study. The coverage is 10x and 8x respectively and i'm aligning them to the reference genome of a sister species. This analysis was already carried out by a previous colleague of the lab, and I would like to obtain the same results as him, as I need to map other samples. He told me he used bwa version 0.7.7 (which I assume is incorrect, missing a 1 before the last 7) with these parameters:
-l 16500 -n 0.01 -o 2
From this and bwa's website (http://bio-bwa.sourceforge.net/bwa.shtml) I assume he used the aln logarythm, as mem does not have these commands.
As I always did alignments with mem algorythm, I run it and the size of the two BAM files obtained were 10 times bigger than his, which seems to me a huge difference...
I'm approaching this the right way? In order to repeat his analyses I'm running:
bwa aln -l 16500 -n 0.01 -o 2 -t 10 $GEN $fastq1 $fastq2 -o $outdir/genome1.bam
Where GEN is the reference genome and fastq1 and fastq2 my paired data of my first genome.
Thanks in advance for any input!
Please read the manual of bwa.
aln
does not produce bam output. It produces an intermediate output that has to be processed withsampe
.sampe
can then output sam, and this you can convert to bam withsamtools
. If you want exact results then ask the colleague for a script. You also use-o
twice, and-o
is not an argument to specify the output file.bwa
writes to stdout so capture it withaln (...) > out.sai
Thanks for your reply. As I'm used to work with mem I did not completely check the aln algorythm. Do you agree that I should use the aln algorythm if he told me he used these parameters?
-l 16500 -n 0.01 -o 2
If you are unsure about previous work, why don't you simply realign all relevant data with a tool that you are confident with and can be sure how output was produced. I never trust analysis when exact code is not available. Who knows what others have done, and if I build on existing results there must be code to check how they were produced, at least this is how I'd do it.
Thanks for your messages! The colleague is not in the lab anymore and there are no means to contact him. Thanks anyway!
If your colleague did not use
mem
with a recent version ofbwa
then you should perhaps see this: A: When and why is bwa aln better then bwa mem?If you must repeat what they did they ask your colleague for the exact commands used. If your reads are shorter than 50 bp then
aln
may be slightly better.That make sense, the reads are 35 bp!
If I remember correctly, bwa aln needs to be followed by bwa sampe (for paired) bwa samse (for single-end). I assume, the bam files you have, are actually sai files. Furthermore, your command doesn't make sense as you have the
-o
switch twice, once for gap open (the right switch for bwa aln) and once for output, which doesn't exist in bwa aln.Adapted from the bwa documentation for your case it should be:
Edit: Just for the record, the other comments seemed to have come in while I was writing.
thanks, really nice answer! I thought the commands from aln were the same as mem, I'll check that before asking next time. Thanks again!