Human Genome Storage API Design Feedback...
1
1
Entering edit mode
10.3 years ago
hcatlin ▴ 100

Hi BioStars Community! This is my first time posting here, but I've been reading a lot of the stuff on here for the last couple of weeks while I've been working on this project.

Basically, I'm building a whole human genome storage platform called GeneHub. I've had some great success with building out my Cassandra-based storage prototype, and now it's time to work on the public API. I've been working out some of my ideas about what the API should look like in a Google Doc, and I'm still learning, so I'd love to get feedback from some of the far more experienced people on this board.

This API is currently focused on if you wanted to access your genome and use it for some computation. It's very basic and uses a "segment" system to paginate the chromosomes.

The google doc link is here

Any help or correction to my thinking here would be super useful. Also, suggestions for API access methods that you'd want if you were writing code... I have some of the ideas for what I'll want to add at the bottom of the document, but would love to get more ideas.

genome sequencing • 2.2k views
ADD COMMENT
0
Entering edit mode

Access to what? What domain is this for? I'd like a way to organize a hundred genomes, only because they require a dozen harddrives right now.

ADD REPLY
0
Entering edit mode
10.3 years ago

Aren't you re-inventing the DAS protocol?

You should also have a look at the formats specified by the ga4gh

ADD COMMENT
0
Entering edit mode

Unfortunately, DAS doesn't have any access control schemes built into it. This system is to store personal genomic data from customers of our sequencing service for (hopefully) thousands of individuals, so access control and privilege granting is going to be key. Though, there are a lot of solid ideas to get from this... and actually, some people on our system may opt to open their genome, and in that case, it seems like we could get a LOT of interop by implementing a DAS interface.

The ga4gh stuff is pretty interesting. I'd never heard of Avro before. I love the spirit of what they seem to be doing, but it seems like their API is currently only defined for accessing reads only (http://ga4gh.org/#/apis/reads/v0.1). Though, their Reference Variation stuff seems really interesting too.

Thanks for taking the time to check it out, Pierre. I was actually reading your blog earlier today! So, really cool to have someone with your knowledge looking over my document!

ADD REPLY
1
Entering edit mode

v0.1 is the very old version proposed by google. Check out the github repo instead. Your proposal also looks similar to the one used by the Personal Genome Project from Harvard Medical School. Would be good to contact them (I am unable to share their slides).

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6