Newbie question: how to create BWA index from multiple .fasta files
2
0
Entering edit mode
6.9 years ago
phonybone • 0

I would like to use BWA MEM to align short reads against the entire hg19 human genome. To do that, I assume that I must create a BWA index. I understand how to create an index from a single file, but how does one create an index from multiple files? Is it best practice to just concatenate all files together? Or is there a better way?

Thanks in advance.

sequencing bwa index • 5.7k views
ADD COMMENT
0
Entering edit mode

Easy way may be to get a sequence/annotation/index bundle from the iGenomes site.

ADD REPLY
1
Entering edit mode
6.9 years ago
Dan D 7.4k

I'm assuming you currently have each chromosome as a separate fasta file. If you don't have a scientific or logistical reason to avoid concatenating the files, then the most straightforward approach would be to concatenate those fasta files into a single one, and then compute your index. Otherwise you'll need to merge alignment data downstream, and you could miss out on things like discordant alignments and large structural abnormalities such as chromosomal translocation.

ADD COMMENT
1
Entering edit mode
6.9 years ago

Instead of making a catted file like Dan suggests, you might be able to get away with catting them and piping that straight into bwa. Something like

cat *.fa | bwa index -p catted -
ADD COMMENT
0
Entering edit mode

Piping is awesome. But wouldn't that preclude running subsequent alignments using that index, unless the complete catted fasta file was saved somewhere?

This would work, I think:

cat *.fa | tee bwa index -p catted - > catted.fasta

ADD REPLY
0
Entering edit mode

Yes, this way does not make a catted fasta, but I think you can still use the index without it.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6