BWA index - is it neccesary?
3
2
Entering edit mode
5.0 years ago
Hansen_869 ▴ 80

I am uncertain about, if I need to index my fasta file before running BWA mem. When is indexing necessary? And if it is neccesary, then what would be input for BWA mem be (fasta, index, R1.gz, R2.gz?).

bwa index • 19k views
ADD COMMENT
5
Entering edit mode
5.0 years ago
GenoMax 147k

All NGS aligners need the reference sequences to be indexed. You would use the index basename with your fastq reads at the time of alignment.

bwa index ref.fa

bwa mem ref.fa reads.fq > aln-se.sam

Note: BBMap aligner can index the fasta reference on the fly but it still needs to do the indexing before alignment happens.

ADD COMMENT
0
Entering edit mode

So I shouldn't feed my original fasta-file to BWA mem ? When I index my fasta file, the resulting fasta file is empty. I suppose something is wrong?

ADD REPLY
0
Entering edit mode

No to first question.

When you index a fasta file (with one or many sequences in it) there should be several additional files produced, which contain the actual index information. They will have names like

genome.fa  genome.fa.amb  genome.fa.ann  genome.fa.bwt  genome.fa.pac  genome.fa.sa

Yes something is wrong if you are getting empty files.

ADD REPLY
0
Entering edit mode
5.0 years ago
ATpoint 85k

Yes, it is absolutely necessary: what is the purpose of indexing a genome

ADD COMMENT
0
Entering edit mode
2.5 years ago

Hello, I've just discovered that my trainee has performed a bunch of bwa alignment without indexing the genome (he used "samtools faidx" as he had learnt in class... )

The alignments have been very long (3-4 days on a high-perf cluster, mouse exomes) but they succeeded and the results seem to be correct.

My guess is that, if bwa doesn't find the index files, it uses the .fa... without the advantages of the BW transform.

ADD COMMENT
0
Entering edit mode

but they succeeded and the results seem to be correct

Is the output a valid SAM format file? I would have thought that bwa will not accept a genome indexed with samtools faidx.

ADD REPLY
0
Entering edit mode

Yes, the sam files are perfectly correct and the rest of the analysis could have been done. Luckily !

I don't think bwa used the fai. Just the fa. I will perform some tests.

ADD REPLY
1
Entering edit mode

Start by checking the exact code and by checking the @PG line in the header of the SAM. It is actually not possible to run BWA without a bwa index, just by how the alignment works. The fai index has no information that helps. I suspect that either the index files were actually there at the location of the reference or the actual output is from another job. In any case, why not just dumping everything and run it properly? I personally would not invest effort into doing the detective work here. Even for very deep WGS files files a bwa alignment should not take longer than a day on a HPC. Just rerun it and be save, this all sounds fishy, and by this results cannot be trusted.

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6