Mongodb: What'S The Most Efficient Way To Store A Genomic Position
1
10
Entering edit mode
14.2 years ago

I want to store some genomic positions using MongoDB.

something like:

{
chrom:"chr2",
position:100,
name:"rs25"
}

I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?

db.snps.save({_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?

db.snps.save({_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?

db.snps.save({chrom:"chr2",position:100,name:"rs25"})

other ?

???

thanks for your suggestion(s)

Pierre

PS: I cross-posted this question on stackoverflow http://stackoverflow.com/questions/3740112

position index database • 5.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
5
Entering edit mode
14.2 years ago
brentp 24k

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENT
0
Entering edit mode

thanks, I'm going to validate this interesting answer.

ADD REPLY
0
Entering edit mode

Hi Pierre, I couldn't find geohash in your benchmark. Did you get to compare it?

ADD REPLY

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6