As of now we have some Hadoop based packages (crossbow, cloudburst etc) for NGS data analysis, still I find tools like bowtie, tophat, SOAP etc that people prefer in their work. I am a biologist but still I want to get some ideas that is it possible to use / convert serial tools into map-reduce form to exploit scalelable distributed computing using Hadoop to expedite research? Also what are the challenges in such mapping and assembling algorithms for using them in hadoop system.
I am also curious to know some other bioinformatics task which can done using hadoop based projects like hive, pig and hbase which deals with big data like fastq files, sam, count data or other form of biological data.
Please, explain why you specifically want to use hadoop. You can always parallelize your analysis without a map/reduce process, cloud, etc....
Actually I am just exploring the hadoop technology, so seeking the challenges or impact of hadoop technologies in NGS / Bioinformatics data analysis. I dont specifically want to use hadoop, but if i try with hadoop , will it be fruitful or not and what hurdles would be there?