Question

BWA index - is it neccesary?

2

Entering edit mode

5.5 years ago

Hansen_869 ▴ 80

I am uncertain about, if I need to index my fasta file before running BWA mem. When is indexing necessary? And if it is neccesary, then what would be input for BWA mem be (fasta, index, R1.gz, R2.gz?).

bwa index • 22k views

ADD COMMENT • link updated 2.9 years ago by ATpoint 87k • written 5.5 years ago by Hansen_869 ▴ 80

score 5 · Answer 1 · 2019-11-08

5

Entering edit mode

5.5 years ago

GenoMax 151k

All NGS aligners need the reference sequences to be indexed. You would use the index basename with your fastq reads at the time of alignment.

bwa index ref.fa

bwa mem ref.fa reads.fq > aln-se.sam

Note: BBMap aligner can index the fasta reference on the fly but it still needs to do the indexing before alignment happens.

ADD COMMENT • link 5.5 years ago by GenoMax 151k

0

Entering edit mode

So I shouldn't feed my original fasta-file to BWA mem ? When I index my fasta file, the resulting fasta file is empty. I suppose something is wrong?

ADD REPLY • link 5.5 years ago by Hansen_869 ▴ 80

0

Entering edit mode

No to first question.

When you index a fasta file (with one or many sequences in it) there should be several additional files produced, which contain the actual index information. They will have names like

genome.fa  genome.fa.amb  genome.fa.ann  genome.fa.bwt  genome.fa.pac  genome.fa.sa

Yes something is wrong if you are getting empty files.

ADD REPLY • link 5.5 years ago by GenoMax 151k

score 0 · Answer 2 · 2019-11-15

0

Entering edit mode

5.4 years ago

ATpoint 87k

Yes, it is absolutely necessary: what is the purpose of indexing a genome

ADD COMMENT • link 5.4 years ago by ATpoint 87k

score 0 · Answer 3 · 2022-06-02

0

Entering edit mode

2.9 years ago

nicolas.soriano ▴ 10

Hello, I've just discovered that my trainee has performed a bunch of bwa alignment without indexing the genome (he used "samtools faidx" as he had learnt in class... )

The alignments have been very long (3-4 days on a high-perf cluster, mouse exomes) but they succeeded and the results seem to be correct.

My guess is that, if bwa doesn't find the index files, it uses the .fa... without the advantages of the BW transform.

ADD COMMENT • link 2.9 years ago by nicolas.soriano ▴ 10

0

Entering edit mode

but they succeeded and the results seem to be correct

Is the output a valid SAM format file? I would have thought that bwa will not accept a genome indexed with samtools faidx.

ADD REPLY • link 2.9 years ago by GenoMax 151k

0

Entering edit mode

Yes, the sam files are perfectly correct and the rest of the analysis could have been done. Luckily !

I don't think bwa used the fai. Just the fa. I will perform some tests.

ADD REPLY • link 2.9 years ago by nicolas.soriano ▴ 10

1

Entering edit mode

Start by checking the exact code and by checking the @PG line in the header of the SAM. It is actually not possible to run BWA without a bwa index, just by how the alignment works. The fai index has no information that helps. I suspect that either the index files were actually there at the location of the reference or the actual output is from another job. In any case, why not just dumping everything and run it properly? I personally would not invest effort into doing the detective work here. Even for very deep WGS files files a bwa alignment should not take longer than a day on a HPC. Just rerun it and be save, this all sounds fishy, and by this results cannot be trusted.

ADD REPLY • link 2.9 years ago by ATpoint 87k