Hello,
I reported this problem on the samtools/htsjdk list but I got no reply yet so I'll cross post here.
In a Java program I need to create a tabix index for a bgzip compressed file. I'm using the method writeBasedOnFeatureFile. No exception is thrown but the resulting index appear to be corrupted as only the first records of the first chrom can be queried.
Could anyone reproduce the issue or am I missing something?
The high level question is: How can I create a tabix index for a bgzip file in Java? I don't need to stick to htsjdk.
Example:
Here's a bed test file, it has 1M rows on 10 chroms. I bgzipped with tabix 1.2.1: https://dl.dropboxusercontent.com/u/53487723/tmp.bedGraph.gz
I created an index for it with the following program (java 8, htsjdk-1.141):
import java.io.File;
import java.io.IOException;
import htsjdk.tribble.bed.BEDCodec;
import htsjdk.tribble.index.IndexFactory;
import htsjdk.tribble.index.tabix.TabixFormat;
import htsjdk.tribble.index.tabix.TabixIndex;
public class TestTabix {
public static void main(String[] args) throws IOException {
String bgzfOut= "tmp.bedGraph.gz";
TabixIndex tabixIndexGz =
IndexFactory.createTabixIndex(new File(bgzfOut), new BEDCodec(), TabixFormat.BED,
null);
tabixIndexGz.writeBasedOnFeatureFile(new File(bgzfOut));
}
}
Querying the resulting index returns only the first records of chr1:
tabix tmp.bedGraph.gz chr1 | wc -l # Should be 100000
4500
tabix tmp.bedGraph.gz chr2 | wc -l # Should be 100000
0
EDIT: The test file looks sane. I can index it with standalone tabix and query it:
tabix -p bed tmp.bedGraph.gz
tabix tmp.bedGraph.gz chr2 | wc -l #` As expected: 100000`
tabix tmp.bedGraph.gz chr2 | head
chr2 100000 100001
chr2 100001 100002
chr2 100002 100003
chr2 100003 100004
chr2 100004 100005
chr2 100005 100006
are you sure your VCF was sorted ?
Hi Pierre, thanks for replying. Input file is bed not vcf, but anyway it looks ok, see edit. I'll have a look at your code, but still... Either there is a big bug in htsjdk.tribble createTabixIndex or I'm doing something silly...!
PS Not sure the link to the test file was working, before, I've updated it.