repeating bwa alignment
0
0
Entering edit mode
4.2 years ago
gubrins ▴ 350

Heys, I'm working with two genomes from a whole-genome sequencing study. The coverage is 10x and 8x respectively and i'm aligning them to the reference genome of a sister species. This analysis was already carried out by a previous colleague of the lab, and I would like to obtain the same results as him, as I need to map other samples. He told me he used bwa version 0.7.7 (which I assume is incorrect, missing a 1 before the last 7) with these parameters:

-l 16500 -n 0.01 -o 2

From this and bwa's website (http://bio-bwa.sourceforge.net/bwa.shtml) I assume he used the aln logarythm, as mem does not have these commands.

As I always did alignments with mem algorythm, I run it and the size of the two BAM files obtained were 10 times bigger than his, which seems to me a huge difference...

I'm approaching this the right way? In order to repeat his analyses I'm running:

bwa aln -l 16500 -n 0.01 -o 2 -t 10 $GEN $fastq1 $fastq2 -o $outdir/genome1.bam

Where GEN is the reference genome and fastq1 and fastq2 my paired data of my first genome.

Thanks in advance for any input!

bwa whole-genome sequencing • 1.4k views
ADD COMMENT
1
Entering edit mode

Please read the manual of bwa. aln does not produce bam output. It produces an intermediate output that has to be processed with sampe. sampe can then output sam, and this you can convert to bam with samtools. If you want exact results then ask the colleague for a script. You also use -o twice, and -o is not an argument to specify the output file. bwa writes to stdout so capture it with

aln (...) > out.sai

ADD REPLY
0
Entering edit mode

Thanks for your reply. As I'm used to work with mem I did not completely check the aln algorythm. Do you agree that I should use the aln algorythm if he told me he used these parameters? -l 16500 -n 0.01 -o 2

ADD REPLY
2
Entering edit mode

If you are unsure about previous work, why don't you simply realign all relevant data with a tool that you are confident with and can be sure how output was produced. I never trust analysis when exact code is not available. Who knows what others have done, and if I build on existing results there must be code to check how they were produced, at least this is how I'd do it.

ADD REPLY
0
Entering edit mode

Thanks for your messages! The colleague is not in the lab anymore and there are no means to contact him. Thanks anyway!

ADD REPLY
1
Entering edit mode

If your colleague did not use mem with a recent version of bwa then you should perhaps see this: A: When and why is bwa aln better then bwa mem?

If you must repeat what they did they ask your colleague for the exact commands used. If your reads are shorter than 50 bp then aln may be slightly better.

ADD REPLY
0
Entering edit mode

That make sense, the reads are 35 bp!

ADD REPLY
1
Entering edit mode

If I remember correctly, bwa aln needs to be followed by bwa sampe (for paired) bwa samse (for single-end). I assume, the bam files you have, are actually sai files. Furthermore, your command doesn't make sense as you have the -o switch twice, once for gap open (the right switch for bwa aln) and once for output, which doesn't exist in bwa aln.

Adapted from the bwa documentation for your case it should be:

bwa aln $GEN $fastq1 > aln_sa1.sai
bwa aln $GEN $fastq2 > aln_sa2.sai
bwa sampe $GEN aln_sa1.sai aln_sa2.sai $fastq1 $fastq2 | samtools view -buSh -  > $outdir/genome1.bam

Edit: Just for the record, the other comments seemed to have come in while I was writing.

ADD REPLY
0
Entering edit mode

thanks, really nice answer! I thought the commands from aln were the same as mem, I'll check that before asking next time. Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 2136 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6