BWA and Samtools indexing
1
2
Entering edit mode
9.8 years ago
lcc1844 ▴ 40

Is anyone able to explain to me the differences between using BWA to index your reference genome for aligning a FASTQ file compared to indexing the reference using the faidx command in Samtools after you've converted a sam file to bam format. I have used some commands with success in Samtools but am actually struggling to understand what the formatting steps do. Specifically, I don't understand what .fai files are and what sorting and indexing the .bam file does.

If any is able to help me understand I would be very grateful.

Many thanks

alignment • 7.4k views
ADD COMMENT
0
Entering edit mode

Thank you for directing me to the previous post.

I would still like to learn the purpose for this indexing. Why is it necessary to make a ref.fai file?

I have used BWA to make a SAM file then used this command in samtools to create a bam file:

samtools view -bS aln.sam > aln.bam

I have then seen protocols which start from this point by making the ref.fai file then converting .sam to .bam as follows:

samtools import hg19.fa.fai aln.sam aln.bam

Is this normal? As I thought the conversion had been performed, so what are the steps with .fai files for?

The protocol then sorts and indexes the bam file. Can the sorting and indexing be done following the first sam to bam conversion?

Thank you

ADD REPLY
0
Entering edit mode

You only need to index a fasta file if you need to random access to sequence that's in the file. Otherwise, it serves little purpose.

The instances you've seen with samtools import are incredibly old and should not be used. Ignore them. The purpose of the fai file in those cases was to act as a substitute for a possibly missing header in the SAM file. Unless your file is missing a header, then there's absolutely no need to include the fasta index (btw, the samtools view version of that is the -t option).

ADD REPLY
0
Entering edit mode

BTW, you can just pipe everything together:

samtools view -uS aln.sam | samtools sort -o - sorted_file_prefix

You can also pipe the output of your aligner to that to avoid the useless SAM file altogether.

ADD REPLY
1
Entering edit mode
9.8 years ago

The two indices have absolutely nothing to do with each other. This was previously addressed here: Is The Bwa Reference Indexing The Same Thing That Fasta Indexing With Samtools?

ADD COMMENT

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6