I'm looking for something similar to tabix. But instead of looking for informations within a given region, I would like to use the values in the ID column for quickly lookup.
So for example I would like to take the compressed dbSNP file, index it by the ID column and than search quickly for the line with a given rs number.
You could put your table in a sqlite database and index the ID column. Then retrieving records by ID will be very fast. However, this would require working with a sqlite file and associated SQL syntax rather than with the usual unix/R/python tools.
long time ago I started this topic. Unfortunately I cannot comment on the solutions mentioned here because I dropped my idea for what I need it and so never tested it.
But today I found a nice trick on github of htslib. The person who posted their uses "rs" as contig name and the number as start and end position. This results in a file like this :
rs 76264143 76264143 chr1 982843 982844 G
You can than index by the first 3 columns and query like before. Nice!
Don't overthink it. Link
gzip
andgunzip
topigz
. Then:A database is way overkill.
tabix has a --targets option..