Hi everyone,
This is my first exposure to bioinformatics, so please bear with me. For a HS assignment, I chose DNA methylation and breast cancer as my topic. I want to explore the correlation between DNA methylation and gene expression in cancer but am overwhelmed with data such as TCGA. Thus I am looking for processed data where I'm able to apply simple statistical analyses but still draw valid conclusions. I saw many mentions of R programming, but I know nothing about this. I came across MExpress and the MENT database but am not sure how to select the right genes etc.
How should I focus my question and/or what logical steps can I take? Please help me find simplified methods.
Any replies or redirect to links would be extremely appreciated. Thank you so much.
Thank you so so much!
However, terribly sorry for ignorance, would this produce a trend line? Or a box plot between normal and tumour tissue? Would I be able to apply chi squared tests and such? Will I be able to simply work with Excel with UCSC?
Doing it in Excel, you will struggle. I think that you should make the most of the opportunity and aim to do all of this in R Programming Language. If you have never used it before, then you could start with my very simple tutorials, which are currently just Powerpoint: https://github.com/kevinblighe/Rtutorials
Dear Kevin,
Thanks so much for all your replies and help thus far, but as I am not familiar with the workflow, I'm not completely understanding so would you mind providing clarification. Am I supposed to use UCSC data to obtain methylation and expression value in R, then correlate them? Can I also try to find patterns with clinical data? Would you be able to please refer me to the R packages that would be required for this. I've read the threads here - would COHCAP and MethylMix work?
Hey Will. That is a lot of questions! Do you not have a supervisor or other colleague in your local section/department?
No, my biology teacher is not specialised in bioinformatics, I knew my project would be independently led. Thanks a lot for your help, Kevin!
I see. Sure, MethylMix is a good option and can download the data automatically for you. From MethylMix, you should be able to obtain a matrix of methylation values, which you can then correlate / overlap with your gene expression data. Be sure that your gene expression data follows a binomial distribution, and that you have filtered out genes of low expression values.
You could also build regression models, which I mentioned yesterday in a very old thread: Correlation between methylation (450K) and gene expression (RNA-Seq)