Hey all,
we are trying to establish a fully automatical standard analysis pipeline for our sequenced samples from a HiSeq2000 machine. I want to know what are your experiences about this...how far can we go...which steps are inevitable, which can not be done in an automatic way...and have you also worked on a "genome content management system" like this??? what are your experiences??
So what we have already realized is:
For every sequencing run different SQL tables contain the information of each sample, for example the samplesheet casava needs, the sample characteristics, what kind of sequencing strand specific or not, the path where you can find the sample data etc. , insert size (PE), maybe metatranscriptomes or -genomes...and a lot more...
We use this information to build an workflow...as first step casava gets started, afterwards the samples get moved to the corresponding project folder, fastqc as quality control gets executed.
Next steps would be mapping an quantification...All this steps are traceable in a CMS. Every big events creates an automatic post in this CMS.
Would be nice to get some feedback about this.... Is it possible AND useful to create such a pipe?? Cause of course every sample is a bit different...what to you think?!
Thanks!
Steve
You're describing a LIMS system. These are not trivial to develop, and not cheap to buy!
I think the first step would be to see what's out there (bioteam minilims, galaxy, taverna, stuff built with ruffus or paver) and report your finding in a blog post or the seqanswers wiki