Question

BWA indexer fails to generate fasta.sa file!

2

Entering edit mode

8.8 years ago

reza.jabal ▴ 580

Hi every one,

I am indexing the human reference genome with BWA with following command:

bwa index -a bwtsw reference.fa

but it fails to generate rbwt, .rpac, .rsa and .sa. I was wondering if any one knows what are these files and how I can generate the .sa file?

sequencing alignment software error • 14k views

ADD COMMENT • link 8.8 years ago by reza.jabal ▴ 580

2

Entering edit mode

Are there any error messages? Is this the exact command you're running?

ADD REPLY • link 8.8 years ago by pld 5.1k

1

Entering edit mode

BWA doesn't pop up any error, but I am trying to find split-reads using LUMPY an it requires fasta.sa!

[bwt_restore_sa] fail to open file human_g1k_v37.fasta.sa' : No such file or directory

ADD REPLY • link 8.8 years ago by reza.jabal ▴ 580

3

Entering edit mode

Are you sure you didn't mean to type human_g1k_v37.fasta.fa?

ADD REPLY • link 8.8 years ago by Devon Ryan 105k

0

Entering edit mode

Is reference.fa a plain multi-fasta format file? This is a straightforward command and should work.

Can run your command as and tell us what you see?

$ bwa index -a bwtsw reference.fa 2>&1

As @Devon points out below reference.fa has to be replaced with a real file name (unless that is what you file is called).

ADD REPLY • link 8.8 years ago by GenoMax 148k

0

Entering edit mode

[bwt_gen] Finished constructing BWT in 688 iterations.
[bwa_index] 3109.53 seconds elapse.
[bwa_index] Update BWT... 15.98 sec
[bwa_index] Pack forward-only FASTA... 15.56 sec
[bwa_index] Construct SA from BWT and Occ... Killed

ADD REPLY • link 8.8 years ago by reza.jabal ▴ 580

1

Entering edit mode

What exit code does it give?

ADD REPLY • link 8.8 years ago by pld 5.1k

2

Entering edit mode

Are you using the latest bwa?

ADD REPLY • link 8.8 years ago by GenoMax 148k

1

Entering edit mode

I am using the bwa (v.0.7.12).

ADD REPLY • link 8.8 years ago by reza.jabal ▴ 580

score 5 · Answer 1 · 2016-04-14

Ok guys, it appears that it is a memory issue! I am sharing this in case anyone else encountered the same problem!

To "construct SA from BWT and Occ" is the last step in indexing. It is also the step that takes most of memory. It is possible that that node does not have enough memory and thus data keep being swapped between RAM and disk. For a 15GB reference genome, you may need around 25GB memory for this step and the subsequent mapping. If you are using LSF/SGE, please make sure you have requested enough memory.

As this is the last step, you may run: bwa bwt2sa ref.bwt ref.sa

to finish indexing, instead of running "bwa index". This step should take several hours.