Which Are The Typical Paper For The Hadoop Application In Genome Analysis?
1
3
Entering edit mode
12.0 years ago
Fayue1015 ▴ 210

I quite interested in this area since Hadoop are becoming more and more popular, I believe there are some excellent papers in this domain, can you recommend some? Thanks

genome analysis • 2.5k views
ADD COMMENT
2
Entering edit mode

Have you done your own literature search yet?

ADD REPLY
0
Entering edit mode

Yes, as far as I know that, I have written such paragraph, do you have some more to add? Considering the consistently dropping cost of sequencing technologies, it is anticipated that by mid 2013, we will enter an era of sequencing one genome at the cost of $1,000 or below1. At that time, we will need to analyze and inter- pret whole-genome data for personalized medicine. Currently, many preparations for genome analysis using big data technologies are on the way. Hadoop-BAM [Niemenmaa et al., 2012], specifically designed for sequence alignment of NGS data, provides a library for directly manipulating the aligned NGS data, which is stored in BAM file (Binary Alignment Map). Eoulsan [Jourdren et al., 2012] provides a cloud computation framework including analysis of high-throughput sequence data from upstream quality control to downstream differential expres- sion detection. CEO [Wang et al., 2010b] and eCEO [Wang et al., 2011] focus mainly on dividing the exponential combination of tests into the distributed com- puting tasks in the cloud. Wang et al. [2012] further extend this work by providing a general framework for combinatorial data analysis.

ADD REPLY
0
Entering edit mode

Just checking because sometimes when people write "I believe there are some excellent papers", it suggests that they have not bothered to look at them :) It's good to indicate that you have done some research when asking for recommendations.

ADD REPLY
1
Entering edit mode
12.0 years ago

Have you seen Crossbow mentioned on the post Analyzing Human Genomes with Hadoop

Crossbow is a scalable software pipeline for whole genome resequencing analysis. It combines Bowtie, an ultrafast and memory efficient short read aligner, and SoapSNP, and an accurate genotyper. These tools are combined in an automatic, parallel pipeline that runs in the cloud (Elastic MapReduce in this case) on a local Hadoop cluster, or on a single computer, exploiting multiple computers and CPUs wherever possible. The pipeline can analyze over 35x coverage of a human genome in one day on a 10-node local cluster, or in 3 hours for about $85 using a 40-node, 320-core cluster rented from Amazon Web Services.

ADD COMMENT
0
Entering edit mode

thanks, very useful

ADD REPLY

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6