Of course, as a bioinformatician, I am aware of many large-scale open-source bioinformatics datasets, such as
- The ENCODE consortium (www.encodeproject.org, RNA-Seq, ChIP-Seq and so on),
- The Roadmap Epigenomics consortium (www.roadmapepigenomics.org, RNA-Seq, Chip-Seq, Bilsulfite-Seq),
- The IHEC consortium (www.ihec-epigenomes.org, RNA-Seq, Chip-Seq, Bilsulfite-Seq),
- The TCGA/ICGC consortia (www.cancergenome.nih.gov, www.icgc.org, large-scale cancer data, DNA-Seq, RNA-Seq, etc.) and
- The LINCS consortium (www.lincscloud.org/l1000 , gene expression for more than a million of different perurbation experiments).
I am wondering, however, what other wonderful datasets, the are both large and open-source, are currently available. That might include things like RNA-Seq, Chip-Seq, Bisulfite-Seq, whole genome sequencing, WGAS, and many other assays (not necessarily NGS-related, though that is what I am mostly looking for).
Also things like the (neural) connectome of certain species (in any event large data) could be of interest.
There are quite some GEO datasets that at least partially fulfill these requirements, but most are simply having to few data samples in order to be interesting to me.
Your comments are greatly appreciated!
To all those who replied: Many thanks for your detailed posts!
Do you know if there is any other resource providing DNAse-seq and mRNA-seq data, other than ENCODE and Roadmap ?