I am uncertain about, if I need to index my fasta file before running BWA mem. When is indexing necessary? And if it is neccesary, then what would be input for BWA mem be (fasta, index, R1.gz, R2.gz?).
I am uncertain about, if I need to index my fasta file before running BWA mem. When is indexing necessary? And if it is neccesary, then what would be input for BWA mem be (fasta, index, R1.gz, R2.gz?).
All NGS aligners need the reference sequences to be indexed. You would use the index basename with your fastq reads at the time of alignment.
bwa index ref.fa
bwa mem ref.fa reads.fq > aln-se.sam
Note: BBMap aligner can index the fasta reference on the fly but it still needs to do the indexing before alignment happens.
Yes, it is absolutely necessary: what is the purpose of indexing a genome
Hello, I've just discovered that my trainee has performed a bunch of bwa alignment without indexing the genome (he used "samtools faidx" as he had learnt in class... )
The alignments have been very long (3-4 days on a high-perf cluster, mouse exomes) but they succeeded and the results seem to be correct.
My guess is that, if bwa doesn't find the index files, it uses the .fa... without the advantages of the BW transform.
Start by checking the exact code and by checking the @PG line in the header of the SAM. It is actually not possible to run BWA without a bwa index, just by how the alignment works. The fai index has no information that helps. I suspect that either the index files were actually there at the location of the reference or the actual output is from another job. In any case, why not just dumping everything and run it properly? I personally would not invest effort into doing the detective work here. Even for very deep WGS files files a bwa alignment should not take longer than a day on a HPC. Just rerun it and be save, this all sounds fishy, and by this results cannot be trusted.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So I shouldn't feed my original fasta-file to BWA mem ? When I index my fasta file, the resulting fasta file is empty. I suppose something is wrong?
No to first question.
When you index a fasta file (with one or many sequences in it) there should be several additional files produced, which contain the actual index information. They will have names like
Yes something is wrong if you are getting empty files.