I'm using this reference to try to see how to analyse data from Illumina EPIC chips .
I can obtain SNP beta values as follows:
> getSnpBeta( RGsetEpic )
<sentrix>_R0xC0y
rs2468330 0.41281518
rs877309 0.02935337
rs2857639 0.47252599...
However this doesn't tell me anything about these SNPs (what base is substituted with what? where? etc.). With getSnpInfo
, I get the following:
> getSnpInfo( RGsetEpic )
DataFrame with 865859 rows and 6 columns
Probe_rs Probe_maf CpG_rs CpG_maf SBE_rs SBE_maf
<character> <numeric> <character> <numeric> <character> <numeric>
cg18478105 NA NA NA NA NA NA
cg09835024 NA NA NA NA NA NA
which is... something, I guess, but not clear to me where the 59 snps are in there.
getLocations
seems to give me the locations, but only for the CG probes (none of the rs probes are included in the resulting GRanges object)
> getLocations( RGsetEpic )
GRanges object with 865859 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
cg18478105 chr20 61847650 *
cg09835024 chrX 24072640 *
cg14361672 chr9 131463936 *
#...[only cg* probes here, no rs :( ]
Does anyone know how I can obtain the locations of the rs probes? (with that I can then query the reference genome and at least figure out the reference base -ideally I'd love to know the alt_base also.)
Thanks!
Ah, I had thought the
rs*
labels were just probe names specific to the illumina array, and didn't even know that dbSNP existed. Your answer has helped me in many ways. Thank you! However do I understand correctly that this still has to be done manually for each probe name?The easy solution is to use GenoMax's suggestion and download the annotations for the array.