Entering edit mode
9.8 years ago
shalini.ravishankar
▴
10
Hi,
Can someone tell me is there a way to do Methylkit R Package on Hadoop?
Please give me some idea on how to proceed this.
This is the Code (Original) :
library(methylKit)
file.list=list( "new_sample1.txt","new_sample2.txt","n_sample3.txt")
myobj=read(file.list,sample.id=list("test1","test2","ctrl1"),assembly="hg19",treatment=c(1,1,0),context="CpG", pipeline=list(fraction=TRUE,chr.col=1,start.col=2,end.col=2,
coverage.col=6,strand.col=3,freqC.col=5 ))
getMethylationStats(myobj[[1]],plot=F,both.strands=F)
pdf("sample1_statistics.pdf")
getMethylationStats(myobj[[1]],plot=T,both.strands=F)
dev.off()
getMethylationStats(myobj[[2]],plot=F,both.strands=F)
pdf("sample2_statistics.pdf")
getMethylationStats(myobj[[2]],plot=T,both.strands=F)
dev.off()
getCoverageStats(myobj[[3]],plot=F,both.strands=F)
pdf("sample3_statistics.pdf")
getMethylationStats(myobj[[3]],plot=T,both.strands=F)
dev.off()
library("graphics")
pdf("sample1_coverage.pdf")
getCoverageStats(myobj[[1]], plot = T, both.strands = F)
dev.off()
pdf("sample2_coverage.pdf")
getCoverageStats(myobj[[2]], plot = T, both.strands = F)
dev.off()
pdf("sample3_coverage.pdf")
getCoverageStats(myobj[[3]], plot = T, both.strands = F)
dev.off()
meth=unite(myobj, destrand=FALSE)
pdf("correlation.pdf")
getCorrelation(meth,plot=T)
dev.off()
pdf("cluster.pdf")
clusterSamples(meth, dist="correlation",method="ward", plot=TRUE)
dev.off()
hc <- clusterSamples(meth, dist = "correlation", method = "ward",plot = FALSE)
pdf("pca.pdf")
PCASamples(meth, screeplot = TRUE)
PCASamples(meth)
myDiff=calculateDiffMeth(meth)
write.table(myDiff, "mydiff.txt", sep='\t')
myDiff25p.hyper <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hyper")
myDiff25p.hyper
write.table(myDiff25p.hyper,"hyper_methylated.txt",sep='\t')
myDiff25p.hypo <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hypo")
myDiff25p.hypo
write.table(myDiff25p.hypo,"hypo_methylated.txt",sep='\t')
myDiff25p <-get.methylDiff(myDiff,differenc=25,qvalue=0.01)
myDiff25p
write.table(myDiff25p,"differentialy_methylated.txt",sep='\t')
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01,meth.cutoff=25)
pdf("diffMethPerChr.pdf")
diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25)
dev.off()
gene.obj <- read.transcript.features(system.file("extdata","refseq.hg18.bed.txt", package = "methylKit"))
write.table(gene.obj,"gene_obj.txt", sep='\t')
annotate.WithGenicParts(myDiff25p, gene.obj)
cpg.obj <- read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt", package = "methylKit"),feature.flank.name = c("CpGi","shores"))
write.table(cpg.obj,"cpg_obj.txt", sep='\t')
diffCpGann <- annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores, feature.name = "CpGi",flank.name = "shores")
write.table(diffCpGann,"diffCpCann.txt", sep='\t')
diffCpGann
promoters <- regionCounts(myobj, gene.obj$promoters)
head(promoters[[1]])
write.table(promoters,"promoters.txt", sep='\t')
diffAnn <- annotate.WithGenicParts(myDiff25p, gene.obj)
head(getAssociationWithTSS(diffAnn))
diffAnn
write.table(getAssociationWithTSS(diffAnn),"diff_ann.txt", sep='\t')
getTargetAnnotationStats(diffAnn, percentage = TRUE,precedence = TRUE)
pdf("piechart1.pdf")
plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation")
dev.off()
pdf("piechart2.pdf")
plotTargetAnnotation(diffCpGann, col = c("green","gray", "white"), main = "differential methylation annotation")
dev.off()
getFeatsWithTargetsStats(diffAnn, percentage = TRUE)
Thanks, Shalini.
Thanks for information Devon. I will look into the Rhadoop and Rmpi. My concern is, this Methylkit takes input as 3 text files and produces a bunch of files with graphs. Will it create any problem if I use it on Hadoop (RHadoop) ??
Err, well you have to rewrite the package so it'll use hadoop anyway, so just rewrite it so that that will work too.
thanks devon. I will try it.
Hi Devon,
I was trying to rewrite but then I am wondering that my R program takes 3 text files as input and uses a library called methylkit and produces multiple output (mostly PDFs which has charts/graphs) Will this process be affected if I use Hadoop?
You'll be reimplementing to use hadoop-aware methods in R, so then it shouldn't be a problem.
Re implementing in sense ,Do I need to change the whole code ?
Probably not.
Can you please explain what do I need to do like steps to achieve this ? It would really help me a lot
I'd have to go through the entirety of the code-base to do that. You're going to have to figure this all out on your own.