tabix with specified coordinates
1
1
Entering edit mode
9.7 years ago
qmemcm ▴ 30

Hi!

I am using tabix to output from a vcf.gz file, some regions of interest. What is interesting is that when I specify, say,

10:7073963-7077040, I get results from regions outside of the range also. For example I get something at position 7073962. Why does tabix do this? Is this a bug? Or is this a built-in feature?

Please let me know.

Thank you.

tabix vcf • 2.7k views
ADD COMMENT
0
Entering edit mode

Is it a SNP that you're getting at 7073962 or a longer InDel? The latter would overlap your specified range.

ADD REPLY
0
Entering edit mode

I don't think tabix looks at the bases when indexing. It only uses CHROM & POS.

ADD REPLY
0
Entering edit mode

True, you're probably correct then that this is a 0 vs. 1-based issue.

ADD REPLY
0
Entering edit mode

It's not clear to me if tabix always uses a 0-based coordinate system.

ADD REPLY
0
Entering edit mode
9.7 years ago
qmemcm ▴ 30

I think Devon's original answer is correct. I just gave that one example. But I have another: I specify the region as 721960-724370, and I get a long indel starting at 721929, which certainly overlaps my region.

Any more thoughts?

Thanks!

ADD COMMENT
0
Entering edit mode

I should really refresh my memory of the tabix code, that'd answer this straight away. I know that there's code to deal with starts/ends, since a tabix index is basically an R-tree for block gzipped files. The question simply becomes whether the parser handles VCF files in an intelligent way or if it just blindly assigns start and and coordinates based on the same VCF column.

Edit: Note that if what I wrote above makes no sense to you that that's not something to worry about :)

ADD REPLY

Login before adding your answer.

Traffic: 2097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6