Greetings.
I have Illumina paired-end reads for E. coli that I used to create a draft assembly (SPAdes). I now have the task of creating the input BAM files that I will use with Pilon -- which is used to improve a draft assembly.
I decided to employ BWA using the documentation here: http://bio-bwa.sourceforge.net/bwa.shtml#3
The plan in to create an index of the reference genome, create the alignments, and then convert to BAM files.
Here is the command I used to index the reference:
bwa index -p bwa_indices/B055 temp/spades/scaffolds.fasta
This command output the following files: B055.amb B055.ann B055.bwt B055.pac B055.sa
The next step is to generate the alignment files -- for which I used the following command:
bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_1P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R1.sai
#bwa aln -t 20 temp/spades/scaffolds.fasta temp/spades/corrected/B055_S5_R1_filtered_2P.fastq.00.0_0.cor.fastq.gz > bwa_indices/B055_R2.sai
After running the first command, I received the following output:
[bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwa_aln] fail to locate the index
The last line has vexed me a bit. There is an output file (B055_R1.sai), but it is empty.
I can clearly see that within my alignment command, there is no reference to any of the index files that were previously created, but when I look at the documentation (http://bio-bwa.sourceforge.net/bwa.shtm), I see no option for referencing any index files. Googling a bit led me to a site that said I needed to have my reference fasta file in the same directory as the index files, and I even changed the name of my draft assembly fasta file from scaffolds.fasta to B055.fasta -- but to no avail. I also unzipped the fastq.gz file and changed the extension from fastq to fq -- all were met with unsuccessful results. Those may still be issues, but it seems to me that referencing the index file(s) in the last bwa aln call is the most pressing issue.
Can anyone kindly point me in the proper direction? I am using BWA Version: 0.7.5a-r405 (I also installed the latest version (Version: 0.7.12-r1039) with no improvement), CentOS 6.7, with 34 cores and plenty of memory.
Thank you in advance.
try to use
as -p from documentation just
so the index should be in the same directory your refernce is , in such case
that is why the error tells you
bonus: If your genome is small (not mammalian size ) use
-a is
so the command will beMedhat,
Thank you for your response (and the
-a is
tip). I played around with the nomenclature so that my commands match your suggestions. However, my error has always occurred with the alignment command -- regardless of the file names.I have updated my script as follows:
I have verified the paths and permissions are not an issue. Do you have any additional suggestions?
what is the genome you are trying to align? cause if it is big you may try
-a bwtsw
and I think it shall workthis post contain some info related to your issue
http://seqanswers.com/forums/showthread.php?t=21099
try to use such as trimmomatic or BBduk to extract paired-end read files from your raw paired-end reads if the raw paired-end read you have is not yet preprocessed. And In my opinion, using Bowtie2 is better. but It still depends on your data. try different alignment tools and find the best one by using pysam scripts