bam index chromosome size limit
1
0
Entering edit mode
19 months ago

Hello everyone, hope you are doing well

SAM/BAM file format specification says that there is a limit to chromosome size that prevents indexing of .bam file, if reference genome had exceptionally large chromosomes. The limit is 2^29-1, which is around 500 M.b.p. This is quite a lot, for example, all human chromosomes are smaller than this, so this limitation does not get in the way very often. However, some organisms, namely barley and wheat, actually have chromosomes around 600 M.b.p. long, so with these genomes it can be an obstruction. I've just tried it:

samtools index file_sort.bam

and it gave the following error:

[E::hts_idx_check_range] Region 536962398..536962445 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
[E::sam_index] Read 'NB552414:80:H3WTKBGXK:3:21607:22796:5232' with ref_name='2H', ref_length=665585731, flags=16, pos=536962399 cannot be indexed
samtools index: failed to create index for "file_sort.bam": Numerical result out of range

Has anyone here encountered this ever before? If yes, how can this be handled? For example, one can split chromosomes into chunks of some 300 M.b.p. to prevent the error from happening. Or am i being too paranoid and just see issues where there are none? Thanks for any help in advance, Nick Shmakov, jr researcher, ICG SB RAS

bam sam mapping rna-seq • 1.7k views
ADD COMMENT
0
Entering edit mode

If you absolutely need a bai index you'll need to break your chromosomes into smaller contigs. If you can use a csi index instead they support sizes greater than the bai index limit given the appropriate min_shift parameter.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion, unfortunately not everything works with .csi file format. But apparently you don't need indexes at all for snp calling with bcftools

ADD REPLY
0
Entering edit mode

I suggest splitting your chromosomes into small chunks. Maybe you don't need index for variant calling, but many programs/software for subsequent analyses cannot process long chromosomes. How long is your longest chromosome? In my experience, many programs/software cannot handle chromosomes longer than 2^31-1 bp. And the worst thing is many of them don't even show an error or warning message.

ADD REPLY
1
Entering edit mode
19 months ago
GenoMax 147k

Create a .csi format index as the message suggests by using

-c       Generate CSI-format index for BAM files

with your command line. The .csi indexes should be understood by IGV. Hope they are by whatever else you are planning to use.

ADD COMMENT
0
Entering edit mode

Thanks for the fast reply. Unfortunately not all the soft works with csi, for example bcftools does not always understand it. But turns out, you don't require bai indexes either for snp/snv calling

ADD REPLY
2
Entering edit mode

bcftools index creates csi indexes by default so they should be understood well.

ADD REPLY

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6