As the reads data set which has to be assembled by de novo assembler is large, the assembly problem can be considered as a big data problem (if wrong, please correct me). Nowadays, people started using big data technologies such as hadoop and spark for dealing with big data problems. When I started looking for hadoop/spark based denovo genome assemblers i came across Contrail, Cloudbrush and Spaler. Contrail was developed in 2010, Cloudbrush was developed in 2012 and Spaler was developed in 2015. Besides these 3 big data technology based assemblers, To the best of my knowledge, there are no other hadoop/spark based denovo genome assemblers. As there are just three hadoop/spark based denovo genome assembler, this has raised a question in me that Is hadoop/MapReduce/Spark good technology for denovo based genome assembler? I'ld be grateful if somebody can throw some light on this. Thanks in advance.
See also this: Why is Hadoop not used a lot in bio-informatics?. I do think hadoop-based de novo assembly makes sense. It is good for a proof-of-concept type of project, but will be used by no one in practice because 1) single-machine assemblers are already good enough and 2) few labs who actually do such assembly use hadoop.
You may want to read this: Genomics is not Special. Computational Biologists are reinventing the wheel for big data biology analysis