Are there enough existing data out there to actually do meaningful research on? Let's say I am financially secure enough to not need to work and have reasonable funds to buy/rent computational time, can I actually just mine existing datasets and publish meaningful work on arxiv? Will anyone actually take me seriously without institutional/academic backing?
There are a lot of datasets on very specific domains of research. Can we actually make them all comparable? For example, can we take chip-seq of one study and RNA-seq of another study and analyze them together? Are the biological samples taken similar enough or library prep steps well described enough to make the data comparable?
From my perspective, the short answer is probably yes and No. Yes - there is a likely a ton of important findings hiding in datasets like TCGA just waiting to be mined with the right approach and targeted questions. No - you probably won't be able to pull it of without institutional/academic backing. For one thing, in order to access controlled data sets you have to go through official channels. And, any discoveries you make and want to validate will need access to additional samples and so on. But, most importantly, the scientific community will be prejudiced against findings published only in places like ArXiv. Someone really clever, determined, with sufficient existing recognition in the community could probably pull it off though. Is it worth it to do this off the reservation? Just to prove it can be done? maybe...