Why less number of hadoop/spark based de novo assemblers?
1
1
Entering edit mode
8.2 years ago
saranpons3 ▴ 70

As the reads data set which has to be assembled by de novo assembler is large, the assembly problem can be considered as a big data problem (if wrong, please correct me). Nowadays, people started using big data technologies such as hadoop and spark for dealing with big data problems. When I started looking for hadoop/spark based denovo genome assemblers i came across Contrail, Cloudbrush and Spaler. Contrail was developed in 2010, Cloudbrush was developed in 2012 and Spaler was developed in 2015. Besides these 3 big data technology based assemblers, To the best of my knowledge, there are no other hadoop/spark based denovo genome assemblers. As there are just three hadoop/spark based denovo genome assembler, this has raised a question in me that Is hadoop/MapReduce/Spark good technology for denovo based genome assembler? I'ld be grateful if somebody can throw some light on this. Thanks in advance.

Hadoop Spark Big data De novo Assembly • 2.0k views
ADD COMMENT
3
Entering edit mode

See also this: Why is Hadoop not used a lot in bio-informatics?. I do think hadoop-based de novo assembly makes sense. It is good for a proof-of-concept type of project, but will be used by no one in practice because 1) single-machine assemblers are already good enough and 2) few labs who actually do such assembly use hadoop.

ADD REPLY
0
Entering edit mode
8.2 years ago
Asaf 10k

Because you don't need distributed and fast storage, you want to save all your data in RAM.

ADD COMMENT

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6