I have the file from encode, with the calculated beta values and probe id. I am wondering how to map these probes to genes symbol? And how to determine their locations, such as promoter, 3'UTR or gene body?
Thanks
I have the file from encode, with the calculated beta values and probe id. I am wondering how to map these probes to genes symbol? And how to determine their locations, such as promoter, 3'UTR or gene body?
Thanks
You could start with the IlluminaHumanMethylation450k Bioconductor package, which contains mappings for at least part of what you're looking for (maybe all of it, I haven't checked).
I think this is probably sufficient. I know IMA provides region-level calculations for the categories that you listed, but that isn't a Bioconductor package and might be using something else.
If not, I know for certain they are included in the .bpm file that you can download here:
The website was being slow for me, but it should work. If not, you can just use the copy of the .bpm file included with the COHCAP demo dataset (the standalone version of COHCAP also adds these labels at the CpG site level, but this is probably not the most efficient way to accomplish this single task):
http://sourceforge.net/projects/cohcap/
Once you delete some of early and final lines (or write code to ignore those lines), this is just a tab-delimited text file that you can parse.
If you don't want to parse the Illumina annotation files, you can use these commands for the Bioconductor code:
library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
data("IlluminaHumanMethylation450kanno.ilmn12.hg19")
annotation.table = getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
Were anyone able to get this 450K annotation file? Please do share with me too.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am wonder do you have any success over your question!! I am at same situation.