I quite interested in this area since Hadoop are becoming more and more popular, I believe there are some excellent papers in this domain, can you recommend some? Thanks
I quite interested in this area since Hadoop are becoming more and more popular, I believe there are some excellent papers in this domain, can you recommend some? Thanks
Have you seen Crossbow mentioned on the post Analyzing Human Genomes with Hadoop
Crossbow is a scalable software pipeline for whole genome resequencing analysis. It combines Bowtie, an ultrafast and memory efficient short read aligner, and SoapSNP, and an accurate genotyper. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible. The pipeline can analyze over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $85 using a 40-node, 320-core cluster rented from Amazon Web Services.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Have you done your own literature search yet?
Yes, as far as I know that, I have written such paragraph, do you have some more to add? Considering the consistently dropping cost of sequencing technologies, it is anticipated that by mid 2013, we will enter an era of sequencing one genome at the cost of $1,000 or below1. At that time, we will need to analyze and inter- pret whole-genome data for personalized medicine. Currently, many preparations for genome analysis using big data technologies are on the way. Hadoop-BAM [Niemenmaa et al., 2012], specifically designed for sequence alignment of NGS data, provides a library for directly manipulating the aligned NGS data, which is stored in BAM file (Binary Alignment Map). Eoulsan [Jourdren et al., 2012] provides a cloud computation framework including analysis of high-throughput sequence data from upstream quality control to downstream differential expres- sion detection. CEO [Wang et al., 2010b] and eCEO [Wang et al., 2011] focus mainly on dividing the exponential combination of tests into the distributed com- puting tasks in the cloud. Wang et al. [2012] further extend this work by providing a general framework for combinatorial data analysis.
Just checking because sometimes when people write "I believe there are some excellent papers", it suggests that they have not bothered to look at them :) It's good to indicate that you have done some research when asking for recommendations.