Question

Associate peak score to gene

0

Entering edit mode

9.8 years ago

RT ▴ 20

Hi all,

I need some suggestions on how to associate a score to nearest gene from peak score. I have elongating form of Pol2, and almost all the peaks (using macs2) are contained inside gene-body. I use bedtools intersect to annotate the peaks to genes. However, I am not sure how to proceed to transfer this score on to genes.

We want to find the transcription rate from the peaks which in our data will always be within the gene-body due Ser2-5 phosphorylation, should I still use the distance to peak center to calculate the gene-wise score? It will also be helpful if you can direct me to any protocol papers.

Thanks,
Aarthi

Pol2 ChIP-Seq • 3.9k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.8 years ago by RT ▴ 20

Ram · Accepted Answer · 2015-10-15

1

Entering edit mode

9.8 years ago

Fidel ★ 2.0k

If I understand correctly you want to associate transcription rate with the read counts after ChIP-seq from PolII Ser5/2 phosphorylation. I think this will not give you the desired results because:

Elongating Poll II speed is not constant and may even stop at some positions, which in turn translates into higher read counts at those positions.
Gene length inversely correlates with the amount of elongating PolII. Larger genes have on average less PolII over the gene body
Regions at overlapping genes have a mixture of Pol II signals, one for each gene.
Elongating PollII creates broad peaks over the gene body that require a higher depth of sequencing to accurately identify them. MACS can miss many elongating PolII broad peaks when few reads are used.

I don't think you need to call peaks in this case because, as you say, any enrichment will be found at gene bodies. Rather you can try to get an average of the PolII (preferable the log of chip vs. input) over the gene body of all the annotated genes and try to cluster those values. At least you should be able to distinguish active vs. inactive genes. To measure transcription rates the method commonly used is GRO-seq.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Fidel ★ 2.0k

0

Entering edit mode

Thanks for the insights and ideas Fidel, I will definitely try to make a heatmap instead of just profile plots from now on.

Most of our profile plots are always biased towards the 3' end than 5'. I did try stratifying based on gene length to see the difference, but I still find that the PolII is preferentially higher in the 3' end regardless of gene lengths for my case.

We did use MACS2 (since none of the other broad peak callers were fruitful) for both narrow and broad peaks, and surprisingly we find lot more narrow than broad peaks with this data. I am not sure if this is due to depth, but we have a average of 20X coverage for 23MB genome.

Thanks,
Aarthi

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by RT ▴ 20

0

Entering edit mode

This paper may be interesting for you: Jonkers, I., Kwak, H., & Lis, J. T. (2014). Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons. eLife, 3, e02407-e02407. doi:10.7554/eLife.02407

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Fidel ★ 2.0k

0

Entering edit mode

Thanks Fidel!

ADD REPLY • link 9.7 years ago by RT ▴ 20