I wanted to compare the methylation data generated in a study by both 450k array and MeDip-seq techniques. The file formats of both differ. The array data is in a txt file with CpG island number and its methylation percentage. The Medip seq data is in BED format with chromosome coverage and the corresponding methylation content. I am new to this field and not sure how to compare both as they both have different column values.
Could you elaborate a little on what you mean for "compare"? you mean one is the control (i.e. 450k) and the other one your condition (i.e. MeDip) and you want to see the changes between the two?
What's you goal? what would you like to find out with the data?
ADD REPLY
• link
updated 2.7 years ago by
Ram
44k
•
written 9.8 years ago by
TriS
★
4.7k
0
Entering edit mode
Both the data are from the same individual and highlight the methylation content. One has cg probes while the other one has chromosome number.
Sample of a Txt file from 450k array :
* ID_REF VALUE Detection.Pval
* cg00000029 0.533466 0
* cg00000108 0.9221188 0
The VALUE represents the amount of methylation(0 - unmethylated and 1 - very highly methylated)
Sample of a Bed file from Medip seq data:
* chr1 1 1000 0.000852090112432486
* chr1 501 1500 0.0005609473955776
* chr1 1001 2000 0
* chr1 1501 2500 0
* chr1 2001 3000 0
I want to do a comparison between data generated from 2 techniques and see the correlation of methylation percentage. How much they match or differ.
My concern is how to find chromosome or genome location of the CpG probe in txt file?
N.B., the R packages I reference below are available on Bioconductor.
I happen to have been comparing some human 450k samples to mouse RRBS samples recently, so perhaps I can provide some guidance.
library("FDb.InfiniumMethylation.hg19")
anno <- get450k()
The annoGRanges object now contains a convenient annotation of each probe on the array and the CpG that it should be informative about. I can also recommend the minfi package, which facilitates a lot of the raw file processing.
Regarding incorporating the MeDIP-seq dataset, that's a little tougher. One possibility is to simply correlate the methylation in the two files. This can be conveniently done by creating a GRanges object from the MeDIP-seq file and then using findOverlaps() to get overlapping CpGs from the 450k array. You would then need to get average (or median) methylation of the probes for each of the overlapped regions and then you can plot/calculate the correlation. How well they're correlated remains to be seen. I've generally been unimpressed with MeDIP-seq (we do mostly bisulfite sequencing).
I have a data_1 which is in text format with columns, chr (representing chromosome number), stable_id, start, end & methylation. This is in txt format, mm9 version.
I have a data_2 which is in bigwig format with columns, seqnames, ranges, strand, methylation score. This is in mm10 format. (over 10 million rows)
I am to compare the data_1$start, data_1$end with data_2$ranges and compute the average methylation score and number of CpG islands.
Steps I followed which I believe is a long route.
Step:1 - Converted data_1 to a file format like 'chrN:start-end' and exported the CSV .
Step:2 - Used this CSV file, uploaded to ucsc genome browser LiftOver tool, converted from mm9 to mm10 - Output was a bed file.
Step:3 - Replaced the start and end of data_1 file with new start and end coordinates of the liftovered output bed file.
Step: 4- Comparing the start and end of data_1 with data_2, This is where I am stuck, takes a lot of time using R to process. IS there a simpler way than what I followed?
New to field. Please explain in steps.
Could you elaborate a little on what you mean for "compare"? you mean one is the control (i.e. 450k) and the other one your condition (i.e. MeDip) and you want to see the changes between the two?
What's you goal? what would you like to find out with the data?
Both the data are from the same individual and highlight the methylation content. One has cg probes while the other one has chromosome number.
I want to do a comparison between data generated from 2 techniques and see the correlation of methylation percentage. How much they match or differ.
My concern is how to find chromosome or genome location of the CpG probe in txt file?