Question

Bioconductor packages for comparing different species data (particularly RNA-seq and DNA methylation)

0

Entering edit mode

10.2 years ago

Saad Khan ▴ 440

Are there any available bioconductor packages for comparing RNA-seq and/or DNA methylation data within two species?

RNA-seq DNA-methylation comparative-genomics • 3.4k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Saad Khan ▴ 440

0

Entering edit mode

Define "comparing". I can think of a few different ways of comparing such datasets and it's quite possible that none of them are what you have in mind. Try telling us what your actual biological goal is and then you'll probably get some more useful advice.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

What I meant to say is comparing orthologus regions with each other. The actual biological goal is to compare a cancer in canines with Humans for a particular tissue and find similar patterns.

What other ways of comparing did you have in mind for going about it?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Saad Khan ▴ 440

1

Entering edit mode

Without biological context you could have just wanted general comparisons between methylation levels in the promoters of various gene classes and a comparison of tpm distributions (or something similar). That's why we usually ask for the experimental context within which you want to do something. I'll give some actual suggestions in an answer below.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-09-17

2

Entering edit mode

10.2 years ago

Devon Ryan 104k

There are a few different things that could be looked at. Firstly, assuming you ran control samples from the dogs in addition to the cancer samples, the first thing to do would be to perform standard differential expression/methylation analysis. For DE, the edgeR, DESeq2 and limma packages are very good and what you'll find everyone recommending. Note that I'm not sure how good the annotations are for the dog genome (I don't work on it), so you might need to use something like RSEM (or trinity followed by RSEM) to get decent metrics, which means you'd be stuck with limma downstream (not that that's a bad thing, limma is an extremely powerful tool). For methylation, it depends on how you generated the data. For RRBS or similar datasets, BiSeq is OK. For methylation arrays, you can use packages like minfi or COHCAP.

One of the interesting things I would do is use GSEA to compare enrichment of groups of differentially expressed/methylated genes between the canine model and patients. You'll obviously need control patient data for this to be worthwhile. If you find any highly relevant pathways (there are a few bioconductor packages for pathway analysis, though I think the Ingenuity Pathway Analysis commercial package is still better in this regard) then I'd pay particular attention to how key players in them are affected in patients.

That's a quick idea and a handful of Bioconductor packages to get you started. I could probably come up with things to look at all day, you have a really target-rich project :)

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

When people compare methylation in two species they usually use liftover tool to transform one species coordinates to other and then compare. Using that approach I could just do a spearman rank correlation of those DMRs. Is there a better way to do something similar. As suggested below to get Phast conservation scores. How do people generally use the Phast conservation scores?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Saad Khan ▴ 440

0

Entering edit mode

A rank correlation could work too, though I suspect you'll get more informative results by looking at subsets. This method would also only allow looking at two samples at a time, which will get annoying quickly. The benefit of looking at conservation scores is that changes in highly conserved regions are much more likely to be biologically significant (the Encode consortium got rightly criticized for not doing this).

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Devon Ryan 104k

0

Entering edit mode

Do you have a paper/link describing the exact procedure as how to go about it?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Saad Khan ▴ 440

Ram · Answer 2 · 2014-09-17

Not Sure about the bioconductor package. But the way, I would do is.

I would map RNA-seq reads on the genomes of both species and would fetch common regions where reads are uniquely mapped. If these regions are supported by other reads then they are orthologous regions getting transcribed.
To get Phast conservations scores of DNA methylated regions.
Once I have these informations then I can play around on R.

HTH