Hi guys, I am interested in dataset GSE37022 where they provide a GFF file that contains the methylation level for each samples accross the whole genome using a sliding window of 100bp. Basically the GFF file looks like this:
1 batman meth 301 400 0.968000 . . ROI "CHR1"; batman.iqr 0.00000
1 batman meth 401 500 0.968000 . . ROI "CHR1"; batman.iqr 0.00000
1 batman meth 501 600 0.979957 . . ROI "CHR1"; batman.iqr 0.00243478
1 batman meth 601 700 0.983667 . . ROI "CHR1"; batman.iqr 0.00143590
1 batman meth 701 800 0.722364 . . ROI "CHR1"; batman.iqr 0.00327273
I am just what is the best way to calculate the methylation level in the promoter regions? are there any pre-existing packages ? I googled and tried to check in bioconductor but it seems there is any. if not, do any one know how to associate the promoter regions obtained using the TxDb.Hsapiens.UCSC.hg19.knownGene library to genes.
> require(TxDb.Hsapiens.UCSC.hg19.knownGene)
> hg19.known<-TxDb.Hsapiens.UCSC.hg19.knownGene;
> prom<-promoters(hg19.known,200,200)
> prom
GRanges with 80922 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr1 [ 11674, 12073] + | 1 uc001aaa.3
[2] chr1 [ 11674, 12073] + | 2 uc010nxq.1
[3] chr1 [ 11674, 12073] + | 3 uc010nxr.1
[4] chr1 [ 68891, 69290] + | 4 uc001aal.1
[5] chr1 [320884, 321283] + | 5 uc001aaq.2
[6] chr1 [320946, 321345] + | 6 uc001aar.2
[7] chr1 [321837, 322236] + | 7 uc009vjk.2
[8] chr1 [323692, 324091] + | 8 uc001aau.3
[9] chr1 [324088, 324487] + | 9 uc021oeh.1
... ... ... ... ... ... ...
[80914] chrY [27330721, 27331120] - | 76942 uc004fwt.3
[80915] chrY [27539596, 27539995] - | 76943 uc022cpa.1
[80916] chrY [27603300, 27603699] - | 76944 uc022cpb.1
[80917] chrY [27604180, 27604579] - | 76945 uc011nbx.1
[80918] chrY [27605479, 27605878] - | 76946 uc004fwx.1
[80919] chrY [27606222, 27606621] - | 76947 uc022cpc.1
[80920] chrY [27607233, 27607632] - | 76948 uc004fwz.3
[80921] chrY [27635755, 27636154] - | 76949 uc022cpd.1
[80922] chrY [59360655, 59361054] - | 76950 uc011ncc.1
---
seqlengths:
chr1 chr2 chr3 chr4 ... chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
249250621 243199373 198022430 191154276 ... 36422 39786 38502
Thanks in advance,
Hi David, thanks for your answer. So you mean first take the gff file generated by batman and make an intersection with the promoters region then annotate the result. is it?
Yes if you want to interpret the result of batman through promoter region annotations. Intersect is the principle I had in mind. The implementation could be more complex as maybe methylation region could span over more than one promoter.
Hi david I ended up doing it with with BEDops tool, as the annotatePeakInBatch was slow, the files are a bit big 1.3G each