Entering edit mode
3 months ago
Dana
•
0
Hello Everyone,
I hope all is well! I would like to create an R script to obtain the following information from 72 CpG sites:
1.Chromosome
- Position
- Type of CpG site (e.g., island, shore, etc.)
- List of all SNPs +/- 500 base pairs for site
- Nearest gene(s)
Does anyone have any suggestions? Please let me know, as I would be most obliged!
Thank you,
Dana
Read about biomaRt - https://bioconductor.org/packages/release/bioc/html/biomaRt.html
How do I reference the specific CpG sites for which I need the information? I only see getting information for an entire chromosome.
Please make a reproducible question. Add examples of which data you have right now and in which format, and how output should look. "CpG" sites are not informative, we need the know how they're represented.
I have a list of 72 cg markers.
Reiterating a point that already exists in the question doesn't appear to fit what they are asking for. Here is a guide to making a minimal reproducible example, but the logic applies to asking non-programming questions too.
What data do you have? Are the methylation data in VCF format? Do you have that variant calling data, or do you need to get that from somewhere else? Hard to make suggestions when all we know is that you have 72 CpG sites from an unknown species, and no idea about supplementary data. But generally, if you have all the relevant VCFs, this is relatively easy to do with a function like
data.table::foverlaps
orlibrary(granges)
for most of that information.I just have human CpG site numbers. I wanted to use data from NCBI or UCSC.
Also, granges requires chromosome information to work.
Please use
ADD REPLY
and not the answer field. That makes the thread messy.Site numbers that include chromosome information? What format are the 72 sites in? And there is a lot of data on NCBI. What kind of data are you looking to use? Do you want to download raw reads, process them, and call SNPs? Please see the comment I left above about minimal reproducible examples.
The site numbers are the CpG IDs. They are in Illumina850.