samtools sorting and indexing
2
3
Entering edit mode
7.5 years ago
ggman ▴ 90

Hi friends,

I am attempting to sort my bam files that I obtained from my bowtie sam files. I am not indexing them appropriate according to this error I am receiving after creating my bam file.

random alignment retrieval only works for indexed BAM or CRAM files.

I understand I am suppose to index the file before sorting them.

    #creating the appropriate files
    samtools view -Sb sample.sam.pair > sample.pair
    samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.pair -o sample.pair.bam

 samtools view -Sb sample.sam.single > sample.single
 samtools view -bt ~/bigdata/refgenome/genome.fa.fai - - | samtools sort sample.single -o sample.single.bam

    #merge
    samtools merge sample.all.bam sample.pair.bam sample.single.bam -@ 2
    rm sample.pair sample.single

    #index the final bam
    samtools index sample.all.bam

Any help would be appreciated.

samtools index sort • 102k views
ADD COMMENT
25
Entering edit mode
7.5 years ago
John 13k

I think you're over-thinking things :)

You can only index BAM files on position, and only when the data is sorted by position to begin with (don't ask...) So to sort by position just do:

samtools sort my.sam > my_sorted.bam

Then index with

samtools index my_sorted.bam

It's as easy as that. If you want to merge the output files from bowtie do that as the very first step, because I don't think samtools performs any optimisations for merging sorted BAMs/SAMs. However, i'd also recommend against bowtie2 in favour of STAR or BWA-MEM, but that's just a personal preference at the end of the day.

ADD COMMENT
9
Entering edit mode

With the latest samtools that command should be samtools sort -o sorted.bam initial.bam.

ADD REPLY
0
Entering edit mode

Oh they changed the syntax to be explicit!? Finally :D

ADD REPLY
1
Entering edit mode

would this take into account my .fai file?

ADD REPLY
4
Entering edit mode

You are still over-thinking, the fasta and bam indexes are two separate and independent things - you don't need one to have the other.

Indexing allows for efficient data access and retrieval. The fasta index (.fai) is used to access and retrieve subsets of the fasta sequence, and the bam index (.bai) to access and retrieve subsets of the bam file.

ADD REPLY
1
Entering edit mode

Oh my goodness.... Thank you both for explaining this to me. I really appreciate it! I only keep talking about my .fai file because my PI left me some code that I could base it off of and it has it on there but I couldn't understand how it was implemented. Thank you.

ADD REPLY
0
Entering edit mode

You're very welcome - if you run into any more complications please don't hesitate to open another question :)

ADD REPLY
0
Entering edit mode
4.9 years ago
onestop_data ▴ 330

Ideally, you want to pipe your alignment into the samtools sort as It aligns show here. However, I guess if you have the BAM files already, the best approach is to use samtools sort and samtools index as shared above.

ADD COMMENT

Login before adding your answer.

Traffic: 2277 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6