DMRcate ranges liftover hg19 to hg38
2
0
Entering edit mode
10 months ago
sativus ▴ 20

Hi Biostars! I have a question regarding DMRcate and its "extractRanges" function regarding its mapping of ranges to genes in hg38.

More specifically, i have performed a DMR-analysis on an EPICv1 dataset where the data is prepared and annotated using the "ilm10b5.hg38" manifest. I then follow the DMRcate pipeline of annotating CpGs to then extract DMRs using the "DMRcate::extractRanges()" function with the "genome = hg38" option. It says in the manual for DMRcate that "Ranges are assumed to map to the reference stated; there is no liftover".

I have after this mapped the ranges to see how they overlap with results from a DEG analysis, and as such want to make sure that the CpG sites for a region are using the correct coordinates to be able to find overlaps between DEGs and DMRs. In the manifest there are separate coordinates for the CpG position and its relative hg38 position. When ranges are retrieved using the "hg38" option, does the program take into account the relative position of the CpG sites seeing as the "default" annotation seem to be for hg19? That is to say, are the positions of the CpG sites in a region determined to be a DMR, mapped to a gene with respect to their relative "hg38" positions. Or are the ranges given in coordinates corresponding to the hg19 annotation?

The reason i am asking is that the results to me, seem to indicate that ranges are presented for genes in their "hg19" form, as when i try to overlap the CpGs for a coordinate range for a DMR i get an insufficient amount of overlaps seen to what DMRcate reports when using the hg38 coordinates, but a full coverage when using the hg19 coordinates for the CpGs?

I would be very greatful for any insights you can provide on the matter

DMRcate Liftover DMR Methylation annotation • 679 views
ADD COMMENT
0
Entering edit mode
7 months ago

Hey, I believe they are mapped to hg19 and this is potentially a bigger problem than people realise, I nearly made this mistake and I would say a lot of people already did.

I did the same as you and came here because I noticed that specifying the genome= hg38 vs hg19 in extractRanges(dmrcateOutput, genome=”hg38”) makes no difference to the locations.

annotation appears to occur at the stage of cpg.annotate(), when you specify: arraytype = "EPIC", the underlaying code annotates the cpgs using: codeannotation = "ilm10b4.hg19". From this point on there is no adjustment as far as I can tell.

You would have to manually run the code to call in ilm10b5.hg38, but its probably easier to use the hg19 coordinates in your manifest file when you are identifying the underlaying CpGs for graphing (if that is your approach).

ADD COMMENT
0
Entering edit mode
10 days ago
t.peters ▴ 20

Hi Sativus and Aaron,

Thanks for the question and apologies for the length taken to answer.

The problem with using ilm10b5.hg38 is that it's not available as an annotation package on Bioconductor. Realistically I can only incorporate Illumina annotations into DMRcate functions that exist within the Bioconductor universe. Recently Zuguang Gu was kind enough to add IlluminaHumanMethylationEPICv2anno.20a1.hg38 to Bioconductor which allowed seamless incorporation with minfi and cpg.annotate(), and so calling DMRs from EPICv2 is now a feature.

If achilleasNP is able to submit ilm10b5.hg38 as an annotation package to Bioconductor (https://github.com/Bioconductor/Contributions) I would be more than happy to import his annotation into DMRcate as an option for users.

Cheers, Tim

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6