I have a data_1 which is in text format with columns, chr (representing chromosome number), stable_id, start, end & methylation. This is in txt format, mm9 version.
I have a data_2 which is in bigwig format with columns, seqnames, ranges, strand, methylation score. This is in mm10 format. (over 10 million rows)
I am to compare the data_1$start, data_1$end with data_2$ranges and compute the average methylation score and number of CpG islands.
Steps I followed which I believe is a long route.
- Step:1 - Converted data_1 to a file format like 'chrN:start-end' and exported the CSV .
- Step:2 - Used this CSV file, uploaded to ucsc genome browser LiftOver tool, converted from mm9 to mm10 - Output was a bed file.
- Step:3 - Replaced the start and end of data_1 file with new start and end coordinates of the liftovered output bed file.
- Step: 4- Comparing the start and end of data_1 with data_2, This is where I am stuck, takes a lot of time using R to process. IS there a simpler way than what I followed?
New to field. Please explain in steps.