Simple integrative DNA methylation and mRNA expression analysis
1
0
Entering edit mode
6.1 years ago
Will • 0

Hi everyone,

This is my first exposure to bioinformatics, so please bear with me. For a HS assignment, I chose DNA methylation and breast cancer as my topic. I want to explore the correlation between DNA methylation and gene expression in cancer but am overwhelmed with data such as TCGA. Thus I am looking for processed data where I'm able to apply simple statistical analyses but still draw valid conclusions. I saw many mentions of R programming, but I know nothing about this. I came across MExpress and the MENT database but am not sure how to select the right genes etc.

How should I focus my question and/or what logical steps can I take? Please help me find simplified methods.

Any replies or redirect to links would be extremely appreciated. Thank you so much.

rna-seq dna methylation data integration • 1.5k views
ADD COMMENT
3
Entering edit mode
6.1 years ago

Hey Will,

Yes, working with the TCGA data can be challenging. One of the most widely used programs for using TCGA open access (level 3) data is TCGAbiolinks. You could simply obtain the methylation and expression data using TCGAbiolinks and then work from there. Still requires some initial learning in R, though, which may or may not be part of your assignment.

cBioPortal also contains gene expression data that you can easily download.

However, probably the best option for you is UCSC GDC Xena Hub (home-page), which contains most TCGA data-types in ready-to-download format. For a HC assignment, this should be more than sufficient.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you so so much!

However, terribly sorry for ignorance, would this produce a trend line? Or a box plot between normal and tumour tissue? Would I be able to apply chi squared tests and such? Will I be able to simply work with Excel with UCSC?

ADD REPLY
1
Entering edit mode

Doing it in Excel, you will struggle. I think that you should make the most of the opportunity and aim to do all of this in R Programming Language. If you have never used it before, then you could start with my very simple tutorials, which are currently just Powerpoint: https://github.com/kevinblighe/Rtutorials

ADD REPLY
0
Entering edit mode

Dear Kevin,

Thanks so much for all your replies and help thus far, but as I am not familiar with the workflow, I'm not completely understanding so would you mind providing clarification. Am I supposed to use UCSC data to obtain methylation and expression value in R, then correlate them? Can I also try to find patterns with clinical data? Would you be able to please refer me to the R packages that would be required for this. I've read the threads here - would COHCAP and MethylMix work?

ADD REPLY
0
Entering edit mode

Hey Will. That is a lot of questions! Do you not have a supervisor or other colleague in your local section/department?

ADD REPLY
0
Entering edit mode

No, my biology teacher is not specialised in bioinformatics, I knew my project would be independently led. Thanks a lot for your help, Kevin!

ADD REPLY
1
Entering edit mode

I see. Sure, MethylMix is a good option and can download the data automatically for you. From MethylMix, you should be able to obtain a matrix of methylation values, which you can then correlate / overlap with your gene expression data. Be sure that your gene expression data follows a binomial distribution, and that you have filtered out genes of low expression values.

You could also build regression models, which I mentioned yesterday in a very old thread: Correlation between methylation (450K) and gene expression (RNA-Seq)

ADD REPLY

Login before adding your answer.

Traffic: 2215 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6