Are big data solutions like Hadoop and others helpful for solving bioinformatics problem? What are some big data approaches for bioinformatic analysis?
Are big data solutions like Hadoop and others helpful for solving bioinformatics problem? What are some big data approaches for bioinformatic analysis?
While Big Data (I hate the term, it is overused to the point of saturation now) in Biology tends to be smaller than say online metadata collection, particle physics, or astrophysics datasets, we also tend to have much shorter time frames for analysis (when compared to physics that is) and different requirements and problems. Hadoop and some of the other common "Big Data" approaches tend to be underused in Bioinformatics, with, IMHO, only a few examples of tools that make use of them.
For some of us Cloud computing is a non-starter. It is in a grey area for me to store my data, which is derived from patients, on servers physically located outside of Canada. So I stick with workstations and clusters in my University or at partner institutions. That all said many of the best bioinformatics tools being developed are at least trying to leverage parallelization and computational optimization as much as possible along with efficient storage, index and search strategies, etc. These are all, without a doubt, Big Data approaches I think.
IF this is a question here are some examples of Hadoop usage in bioinformatics from previous question in Biostars
Distributed Computing in Bioinformatics
this also a nice video
genome assembly and resequence
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what is your problem?
Do you just want to say that or you want to ask whether big data solutions are helpful?