Hello everybody,
Since three weeks I try to make sense out of my (h)medip-seq data (I am an absolute beginner in deep sequencing data analysis and bioinformatics in general) and I hope you guys can give me some tips.
Sample: in vivo material of three linage related cell types
Method: comperative (h) MeDip-seq ( Li Tan et al. 2013) with 300 ng of each of the three celltypes (in total 900ng DNA) using the Diagnode MeDip and hMeDip kits.
First of all I don't know if my data look alright since I have really narrow peaks (reads are extended to 250 bp) which align quite often only to on strand (hemimethylated DNA or non-CpG methylation), (http://i.imgur.com/1TiUPTQ.png). I tried already to download published raw medip-seq data to compare them with my data but without success. Most times the data were already processed and therefor i couldnt compare them with my data. Is this narrow peak pattern normal or should I see more broader peaks (MeDip QC and Sequencing QC seem all okay)?
The second problem is to analyse the data and find differential methylated peaks/regions). I used already the MEDIPS package (see code below) but these gives me around 10 peaks without duplicates whereas I get around 500 peaks with duplicates (adj=T).
`
library("MEDIPS")
library("BSgenome.Mmusculus.UCSC.mm10")
library("edgeR")
bcv <- 0.01 counts <- matrix( rnbinom(40,size=1/bcv^2,mu=10), 20,2) y <- DGEList(counts=counts, group=1:2) et <- exactTest(y, dispersion=bcv^2) medipoN="/home/crtd_c...MeDip_Output.bam" medipoR="/home/crtd_c ...2MeDip_Output.bam" BSgenome="BSgenome.Mmusculus.UCSC.mm10" uniq=FALSE extend=100 shift=0 ws=100 chr.select=NULL
N_Output= MEDIPS.createSet(file = medipoN, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) R_Output= MEDIPS.createSet(file = medipoR, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select)
mr.edgeR.MeDip = MEDIPS.meth(MSet1 = N_Output, MSet2 = R_Output, p.adj = "hochberg",diff.method = "edgeR", prob.method = "poisson", MeDIP = F, CNV = F, type = "rpkm", minRowSum = 1)
mr.edgeR.Medip.s = MEDIPS.selectSig(results = mr.edgeR.MeDip, p.value = 0.2, adj = T, ratio = NULL, bg.counts = 30, CNV = F)
However if I visualize the called regions in seqmonk they don't look really interesting whereas I find easily other regions which look more differential methylated. So I don't really trust the package but may used some wrong settings? Can I use adj=F (but i think then i will get to much false positive peaks) ?
I proceed then with MACS2 to call broad peaks in all three samples, merged them and count the reads of the peaks using bedtools.
`macs2 callpeak -t /home/crtd_c ... MeDip_Output.bam -c /home/crtd_c ... MeDip_Input.bam --broad -f BAM -g mm --bw 250 -n PP_MeDip_MACS2` -> for each of the three celltypes
merge all three with "cat" function
bedtools merge -i /home/crtd_csortedmacs2peaks.bed -n -scores sum& -d 100 > mergedMACS2peaks.bed bedtools multicov -bams /home/crtd_c...PP_MeDip_Output.bam /home/crtd_c...DP_MeDip_Output.bam /home/crtd_c...N_MeDip_Output.bam -bed /home/crtd_c...mergedMACS2peaks.bed > MeDippeaks.bed -D -q 30 ```
I sorted all peaks out which have no more than 20 reads in one of the three samples ( sorted <-subset(mydata, ((V6>20)|(V7>20)|(V8>20)))
)and calculated the log2 ratio between the three celltypes (so at the end I had a excelsheet with the genomic coordinates and the three log2 ratios). However now is the question can I do it like this and if yes, how can I define the significant changes of the log2 ratios (simply +1/-1 fold change ?).
However I am happy about any other suggestions regarding the data analysis. May there is a tool out there which I didn't try so far.
Thanks a lot!
Thanks a lot, I will try diffreps as well as the MACS2 bdgcmp option.
Hi Florian,
How was your experience with the MACS2 bdgcmp option? I was trying to compare a Tumor and Normal sample ( both having CONTROL Sample) and after running MACS2 bdgcmp option, I am only getting 1-2 DMR peaks ? Is that anything obvious that I am missing?
Thanks