Our lab has a variety of NGS datasets that have been accumulating for many years (RNA-Seq, ChiP-Seq, DNase-Seq etc.) for different conditions and projects. In addition there are many relevant public datasets that are available too (e.g. ENCODE). I was wondering how to best organize and store these datasets so that an integrative analysis can be readily done.
I guess simplest can be that I have respected files for each processed file (e.g. ChiP-Seq peaks, RPKM values for RNA-Seq etc.) for different condition, but then it's difficult to summarize all the data for any new person. Have someone experienced similar issues, and found or developed a useful pipeline to store and integrate multidimensional genomic datasets?
Thanks!
Keeping lists of genes in GMT format from each experiment is a lightweight approach. Differential RNA-seq data can be collapsed into rank files which can be analysed using spearman correlation.