Hello, my dept. Head gifted me with a nice hosting space in our institution new research cloud computing (12 core and I think 16 RAM size). I built a basic web app that will analyze a genomic data and print the results to user, I used Java related web technologies JSF etc, ( because my old work was in java ) the web app works fine with a small # of sequences,however it crashes when I try large sequence files. I'm beginner when it comes to web technologies, going the wrong direction will cost me valuable time. I would like to hear your advices, Ideas about how I should tackle this project, keeping in mind that I already have the java code written in old fashion way (console filename parameter1 parameter2 outputfile) etc
Here is the some of my concern and questions
- What are the most popular web technologies used by big institution bioinformatics tools? Is there a state of art Java implementation in this area?
- Does including a database in the implementation have bad effect "performance wise"? (in my current implementation, the sequences first stored in db before executing the analysis)
- In a local environment, a researcher will leave his PC running for days. In server side environment I need queries service when the submitted data require long processing time, I need resources to do this kind of implementation in java/JSF
- Finally my old work written in single threaded java code, as workaround I overclocked my cpu, my recent work mostly multi threaded. How does the threading enhance the performance of the web application?
I can't answer most of your questions because I have no idea what some of your parameters are. When you say "the web app works fine with a small # of sequences" -- what number is small? "Large sequence files" - what number is large? What exactly do you mean by "it crashes" -- do you get an out of memory error and the job is killed? When you say you built a "basic web app that will analyze genomic data" -- do you mean you wrote a new mapper, variant caller, etc -- or that you have used Java to connect things like BWA, GATK, etc into a web interface?
You might want to take a look at the Galaxy project. Some places use this as a web-based GUI to analyze genomic data.