I am a newbie experienced in bioinformatics, linux and python programming interested in cancer genomics. I don't have any funding at the moment to sequence cancer genomes from patient samples and publish results. So I am wondering if it is a good idea to download cancer genomes from https://tcga-data.nci.nih.gov/tcga/ or some other source (let me know if you are aware of any), perform some preliminary analysis using different variant calling algorithms, publish some short papers and then apply for funding. I have access to several clusters and supercomputers for processing data. Thank you.
Of course it is a good idea to use data from public resources, but you will have to plan your project very well before starting.
Take into account that for the data in the Cancer Genome Atlas, other people have probably already done a variant calling analysis. The best thing to do is to contact each institute directly, and ask them if you can contribute in any way, and how.
Thanks for your suggestion. I will write to them. I just gave an example of variant calling which is very tricky. Given the massive amount of data, there will be plenty of possible analysis. I wonder if they will be interested in me. I need to show some preliminary results in order to get some funding to run own sequencing projects.
The only thing you need to contact TCGA for is permission to access the restricted data (which is access-controlled for patient privacy reasons). They'll want to see that you have some kind of plan to do cancer genomics research with it, but it can be loosely defined. The point of the project is to produce data that will be available to the scientific community.
I think everyone's plan is more or less similar - to identify novel mutations, genes or biomarkers for that particular type of cancer.
You'd be surprised at the variety. There are lots of people working on things like algorithmic improvements or pan-cancer analyses, for example.
Yes I was reading about the RNA-seq Genome Annotation Assessment Project.