Can i use MongoDB for DNA database storage
Can i use MongoDB for DNA database storage
Short answer is "yes"; you can store anything you like in MongoDB, it's schema-free.
Longer answer: as discussed above, the main consideration is the 4 MB size limit for documents. This constrains your document design, so you need to think about what goes into a document. As Brad suggested, you'll need GridFS to store larger objects.
As with any database, it's good to think about design before you start. The temptation with MongoDB is simply to "stuff and forget" - any data that can be parsed into a hash-like structure is easy to save. But then how do you retrieve documents and what do you want to do with them? It helps to have a good idea of document structure, which keys to index, which keys to query on and so on. This is particularly the case if you intend to employ map-reduce, e.g. compound keys will not work for that case. Some people like to impose a schema using one of the many available object document mappers (ODMs), either when saving or later on for query/retrieval.
MongoDB also allows relations between collections. These can be useful in certain scenarios, although some purists suggest that it's better to avoid "relational thinking" and aim for a purely key-value approach, in order to better understand MongoDB.
(Neil's not here ? yessss ;-) ).Yes, MongoDB is a key Value DataStore, so, for example, if you want to save a pair (name,sequence), then it is straightforward with mongodb
use mydb;
db.dna.save({_id:"CB017399", seq:"GGAAGGGCTGCCCCACCATTCATCCTTTTCTCGTAGTTTGTGCACGGTGCGGGAGGT..."});
and you can also add some indexed data:
db.dna.save(
{_id:"CB017399",
gi:27592135,
organism:{name:"Gallus gallus",taxid:9031},
seq:"GGAAGGGCTGCCCCACCATTCATCCTTTTCTCGTAGTTTGTGCACGGTGCGGGAGGT..."}
);
However I'm not sure it would a good way to store some large sequences.
If you really want to use a key/value datastore, have a look at BerkeleyDB. This (free) engine is interesting because it is fast (everything is binary data), it can be embedded (no network involbed) and you can ask for only retrieving a chunk of your value. So, say, if you store the human chr1 and ask for the very first bases, you won't have to load the entire chromosome in memory.
Individual objects in MongoDB have a 4MB size limit. For large sequences, use GridFS storage in Mongo: http://www.mongodb.org/display/DOCS/GridFS
Anyone know the source where application of Mean Stack(http://mean.io/#!/) in NGS/bioinformatics web development given for instance any tutorial or website ? Thanks!!
sorry, your question makes no sense, just from grammar standpoint.
Are you looking for mean stack tutorials? What makes you think that this framework is great for NGS /bioinformatics web development? What is 'bioinformatics web development' anyway? Where is that supposed to be different than 'normal' web development? Why do you post this in a 6 year old thread, I guess because of mongo db - but why does it have to be mongo db, what do you want to accomplish anyway? Question, questions, questions.
Ok. I don't know any links for that matter. And I also don't think that this exists. As I stated above, I don't see where bioinformatic web-tools are different than 'normal' webtools. I think you just should go through the normal tutorials for the programming language you want to use. I don't think anybody put together specific tutorial for this specific topic but if you find something, post it would be interesting to read.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You should probably specify the use cases/scenarios that you have in mind.
How come peple vote up such a qustion ?