Question

Using BWA to create input BAM files for downstream analysis

0

Entering edit mode

9.0 years ago

Cricket ▴ 10

Greetings.

I have Illumina paired-end reads for E. coli that I used to create a draft assembly (SPAdes). I now have the task of creating the input BAM files that I will use with Pilon -- which is used to improve a draft assembly.

I decided to employ BWA using the documentation here: http://bio-bwa.sourceforge.net/bwa.shtml#3

The plan in to create an index of the reference genome, create the alignments, and then convert to BAM files.

Here is the command I used to index the reference:

bwa index -p bwa_indices/B055 temp/spades/scaffolds.fasta

This command output the following files: B055.amb B055.ann B055.bwt B055.pac B055.sa

The next step is to generate the alignment files -- for which I used the following command:

bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_1P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R1.sai
#bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_2P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R2.sai

After running the first command, I received the following output:

[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln] fail to locate the index

The last line has vexed me a bit. There is an output file (B055_R1.sai), but it is empty.

I can clearly see that within my alignment command, there is no reference to any of the index files that were previously created, but when I look at the documentation (http://bio-bwa.sourceforge.net/bwa.shtm), I see no option for referencing any index files. Googling a bit led me to a site that said I needed to have my reference fasta file in the same directory as the index files, and I even changed the name of my draft assembly fasta file from scaffolds.fasta to B055.fasta -- but to no avail. I also unzipped the fastq.gz file and changed the extension from fastq to fq -- all were met with unsuccessful results. Those may still be issues, but it seems to me that referencing the index file(s) in the last bwa aln call is the most pressing issue.

Can anyone kindly point me in the proper direction? I am using BWA Version: 0.7.5a-r405 (I also installed the latest version (Version: 0.7.12-r1039) with no improvement), CentOS 6.7, with 34 cores and plenty of memory.

Thank you in advance.

BWA alignment assembly genomes • 4.8k views

ADD COMMENT • link 9.0 years ago by Cricket ▴ 10

0

Entering edit mode

try to use

bwa index -p scaffolds temp/spades/scaffolds.fasta

as -p from documentation just

Prefix of the output database

so the index should be in the same directory your refernce is , in such case

temp/spades/scaffolds.fasta

that is why the error tells you

fail to locate the index

bonus: If your genome is small (not mammalian size ) use -a is so the command will be

bwa index -p scaffolds -a is  temp/spades/scaffolds.fasta

ADD REPLY • link 9.0 years ago by Medhat 9.8k

0

Entering edit mode

Medhat,

Thank you for your response (and the -a is tip). I played around with the nomenclature so that my commands match your suggestions. However, my error has always occurred with the alignment command -- regardless of the file names.

I have updated my script as follows:

mkdir -p bwa_indices
bwa index -p B055 -a is B055.fa
bwa aln -t 20 B055.fa ../temp/spades/corrected/B055_S5_R1_filtered_1P.fq > B055_R1.sai


[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln] fail to locate the index

I have verified the paths and permissions are not an issue. Do you have any additional suggestions?

ADD REPLY • link 9.0 years ago by Cricket ▴ 10

0

Entering edit mode

what is the genome you are trying to align? cause if it is big you may try -a bwtsw and I think it shall work

this post contain some info related to your issue

http://seqanswers.com/forums/showthread.php?t=21099

ADD REPLY • link 9.0 years ago by Medhat 9.8k

0

Entering edit mode

try to use such as trimmomatic or BBduk to extract paired-end read files from your raw paired-end reads if the raw paired-end read you have is not yet preprocessed. And In my opinion, using Bowtie2 is better. but It still depends on your data. try different alignment tools and find the best one by using pysam scripts

ADD REPLY • link 9.0 years ago by Calvin ▴ 80

score 1 · Accepted Answer · 2016-09-09

I may not have the answer 100%, I was able to get my commands to run.

The documentation (http://bio-bwa.sourceforge.net/bwa.shtml#3) has the following for aligning (note the in.db.fasta):

aln     bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] <in.db.fasta> <in.query.fq> > <out.sai>

I had been using the following (I tried both .fa and .fasta):

bwa aln -t 20 B055.fa/fasta B055_R1_1P.fq  > B055_R1.sai

I changed it to below and it ran.

bwa aln -t 20 B055 B055_R1_1P.fq  > B055_R1.sai

If that was the issue, the developers may want to consider updating their documentation.