Question

how to get all gc/cpg positions in the latest human genome (gencode hg19)

1

Entering edit mode

11.1 years ago

Saad Khan ▴ 440

Hi,

I am trying to reuse some publicly available results of bisulfite methylation analysis. These contain mc tables which do not have strand information. The data uses gencode v17 hg19 in their analysis. In the past I have generated all gc positions in the hg19 human genome using bismark bedgraph2cytosine modules which gives all cg positions in the genome according to the author.

I am trying to calculate regional methylation of the bisulfite results which have values between 0 and 1 (for any particular cpg position). My idea was to normalize the total exonic methylation (sum of these values over the exon) and normalize them with GC content of the exon.In order to do so I have considered only the cpg positions on one strand (+ strand in this case)

So when I divide the total exon methylation (sum of bisulfite ratios or values) by total GCs in that exon for some exons I am getting methylation greater than 1. What could be the possible reasons for it I was wondering since the number of GCs should remain the same even if gencode version changes (correct me If I am wrong).

This is the way I have calculated the gc content for exon once I get results from bismark after only using one strand.

coverageBed -a chr1_gcregions.txt -b chr1_exons.txt |cut -f1,2,3,4,5,6,7 > exon_gccount.txt

Let me know if I am doing something wrong.

Thanks

cpg hg19 gencode • 4.2k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.1 years ago by Saad Khan ▴ 440

0

Entering edit mode

I think one possible reason is that you are using data that might have information for top and bottom strand CpG sites and then normalizing only to top strand CpGs. Are you getting some regional methylation values that are close to 2 after normalization? It would help if you could point to the data you are using.

ADD REPLY • link 11.1 years ago by Matt Shirley 10k

0

Entering edit mode

If I divide them by (total of top and bottom strand cpgs I get most values <= 0.5 with some being a little more than that.

Its actually blueprint epigenome data

The files look somewhat like this :-

chr1    10492    10494    0.765566
chr1    10496    10498    0.802229
chr1    10524    10526    0.981968
chr1    10541    10543    0.951523
chr1    10562    10564    0.87209
chr1    10570    10572    0.904065
chr1    10576    10578    0.596719
chr1    10578    10580    0.756832
chr1    10588    10590    0.903772
chr1    10608    10610    0.799162
chr1    15642    15644    0.787029

ADD REPLY • link updated 6.2 years ago by Ram 45k • written 11.1 years ago by Saad Khan ▴ 440

0

Entering edit mode

There are values of average regional methylation (averaged over GC) that are higher than one these are very few and like these :-

ADD REPLY • link updated 6.2 years ago by Ram 45k • written 11.1 years ago by Saad Khan ▴ 440