Question

Methylkit using hadoop for DNA analysis

0

Entering edit mode

9.8 years ago

shalini.ravishankar ▴ 10

Hi,

Can someone tell me is there a way to do Methylkit R Package on Hadoop?

Please give me some idea on how to proceed this.

This is the Code (Original) :

library(methylKit)
file.list=list( "new_sample1.txt","new_sample2.txt","n_sample3.txt")
myobj=read(file.list,sample.id=list("test1","test2","ctrl1"),assembly="hg19",treatment=c(1,1,0),context="CpG", pipeline=list(fraction=TRUE,chr.col=1,start.col=2,end.col=2,
coverage.col=6,strand.col=3,freqC.col=5 ))
getMethylationStats(myobj[[1]],plot=F,both.strands=F)
pdf("sample1_statistics.pdf")
getMethylationStats(myobj[[1]],plot=T,both.strands=F)
dev.off()
getMethylationStats(myobj[[2]],plot=F,both.strands=F)
pdf("sample2_statistics.pdf")
getMethylationStats(myobj[[2]],plot=T,both.strands=F)
dev.off()
getCoverageStats(myobj[[3]],plot=F,both.strands=F)
pdf("sample3_statistics.pdf")
getMethylationStats(myobj[[3]],plot=T,both.strands=F)
dev.off()
library("graphics")
pdf("sample1_coverage.pdf")
getCoverageStats(myobj[[1]], plot = T, both.strands = F)
dev.off()
pdf("sample2_coverage.pdf")
getCoverageStats(myobj[[2]], plot = T, both.strands = F)
dev.off()
pdf("sample3_coverage.pdf")
getCoverageStats(myobj[[3]], plot = T, both.strands = F)
dev.off()
meth=unite(myobj, destrand=FALSE)
pdf("correlation.pdf")
getCorrelation(meth,plot=T)
dev.off()
pdf("cluster.pdf")
clusterSamples(meth, dist="correlation",method="ward", plot=TRUE)
dev.off()
hc <- clusterSamples(meth, dist = "correlation", method = "ward",plot = FALSE)
pdf("pca.pdf")
PCASamples(meth, screeplot = TRUE)
PCASamples(meth)
myDiff=calculateDiffMeth(meth)
write.table(myDiff, "mydiff.txt", sep='\t')
myDiff25p.hyper <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hyper")
myDiff25p.hyper
write.table(myDiff25p.hyper,"hyper_methylated.txt",sep='\t')
myDiff25p.hypo <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hypo")
myDiff25p.hypo
write.table(myDiff25p.hypo,"hypo_methylated.txt",sep='\t')
myDiff25p <-get.methylDiff(myDiff,differenc=25,qvalue=0.01)
myDiff25p
write.table(myDiff25p,"differentialy_methylated.txt",sep='\t')
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01,meth.cutoff=25)
pdf("diffMethPerChr.pdf")
diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25)
dev.off()
gene.obj <- read.transcript.features(system.file("extdata","refseq.hg18.bed.txt", package = "methylKit"))
write.table(gene.obj,"gene_obj.txt", sep='\t')
annotate.WithGenicParts(myDiff25p, gene.obj)
cpg.obj <- read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt", package = "methylKit"),feature.flank.name = c("CpGi","shores"))
write.table(cpg.obj,"cpg_obj.txt", sep='\t')
diffCpGann <- annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores, feature.name = "CpGi",flank.name = "shores")
write.table(diffCpGann,"diffCpCann.txt", sep='\t')
diffCpGann
promoters <- regionCounts(myobj, gene.obj$promoters)
head(promoters[[1]])
write.table(promoters,"promoters.txt", sep='\t')
diffAnn <- annotate.WithGenicParts(myDiff25p, gene.obj)
head(getAssociationWithTSS(diffAnn))
diffAnn
write.table(getAssociationWithTSS(diffAnn),"diff_ann.txt", sep='\t')
getTargetAnnotationStats(diffAnn, percentage = TRUE,precedence = TRUE)
pdf("piechart1.pdf")
plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation")
dev.off()
pdf("piechart2.pdf")
plotTargetAnnotation(diffCpGann, col = c("green","gray", "white"), main = "differential methylation annotation")
dev.off()
getFeatsWithTargetsStats(diffAnn, percentage = TRUE)

Thanks, Shalini.

DNA Methylkit R hadoop • 3.6k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by shalini.ravishankar ▴ 10

Ram · Answer 1 · 2015-01-26

0

Entering edit mode

9.8 years ago

Devon Ryan 104k

There's no native support for hadoop in methylkit. You'd need to either modify the code to use hadoop (likely via Rhadoop) or more simply by using Rmpi or something along those lines to run statistics across regions in parallel (you'd probably have to modify the code to do this too).

ADD COMMENT • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for information Devon. I will look into the Rhadoop and Rmpi. My concern is, this Methylkit takes input as 3 text files and produces a bunch of files with graphs. Will it create any problem if I use it on Hadoop (RHadoop) ??

ADD REPLY • link 9.8 years ago by shalini.ravishankar ▴ 10

0

Entering edit mode

Err, well you have to rewrite the package so it'll use hadoop anyway, so just rewrite it so that that will work too.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

thanks devon. I will try it.

ADD REPLY • link 9.8 years ago by shalini.ravishankar ▴ 10

0

Entering edit mode

Hi Devon,

I was trying to rewrite but then I am wondering that my R program takes 3 text files as input and uses a library called methylkit and produces multiple output (mostly PDFs which has charts/graphs) Will this process be affected if I use Hadoop?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by shalini.ravishankar ▴ 10

0

Entering edit mode

You'll be reimplementing to use hadoop-aware methods in R, so then it shouldn't be a problem.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Re implementing in sense ,Do I need to change the whole code ?

ADD REPLY • link 9.8 years ago by shalini.ravishankar ▴ 10

0

Entering edit mode

Probably not.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Can you please explain what do I need to do like steps to achieve this ? It would really help me a lot

ADD REPLY • link 9.8 years ago by shalini.ravishankar ▴ 10

0

Entering edit mode

I'd have to go through the entirety of the code-base to do that. You're going to have to figure this all out on your own.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k