Question

How to do pathway enrichment analysis with genomic ranges data (ATAC-Seq peaks)?

0

Entering edit mode

2.8 years ago

MatthewP ★ 1.4k

Hello, I am processing ATAC-Seq data. I want to know which pathways may affected by treatment, so I annotated differential (binding) peak regions with ChIPseeker. My question is how to filter annotation result before putting annotated genes to do pathway analysis? I am worrying many annotated genes are not affected by peaks actually. Second, can someone recommand some good materials about how genomic positions (promoter, 5' UTR, Intergenic...) could regulate gene expression. Third, I would like to know other methods to do genomic ranges pathway enrichment analysis. Thanks.

ATAC-Seq pathway annotation • 2.2k views

ADD COMMENT • link updated 2.7 years ago by LauferVA 4.5k • written 2.8 years ago by MatthewP ★ 1.4k

1

Entering edit mode

ATAC-seq (or any genomic non RNA assay) is problematic because there is no information about the transcriptional or proteomic changes. Everything is either motif- or correlation based when it comes to assigning target genes, and many peaks (in my experience) simply cannot meaningfully be annotated to a gene. Anyway, there is GREAT http://great.stanford.edu/public/html/ which takes genomic positions and then performs enrichment analysis using nearby genes. That is the arguably (imo) most established method that is simple to use.

ADD REPLY • link 2.8 years ago by ATpoint 85k

score 3 · Accepted Answer · 2022-02-23

I recommend that you create decision rule based on your understanding of the literature, then validate it against pathways that are very well known to be D.R. in your disease state.

Of course, it goes without saying that some decision rules will be better than others. For instance, you could choose all genes within Y bases of X. Or, you could do a lot more work and reason out which genes you think are the most likely to be differentially regulated based on the alteration to chromatin structure you are seeing in your data. How would you do this? Well, you are free to use your understanding of the literature and any available data. So, while it is true what ATPoint says (motif or correlation based) theres no reason why you can't attempt to find mQTL data in your disease of interest, for instance, and use that to help you make the gene list ... in either event, there is a ton of useful data of different kinds out there that can help you.

Here's an example. https://pubmed.ncbi.nlm.nih.gov/24390342/#&gid=article-figures&pid=figure-2-uid-1 - Check out Figure 2. What did they have? Well, a list of genomic coordinates. What did they obtain? Well, a list of prioritized genes near to each association signal. From there, nothing is to stop them from feeding, say, their most highly prioritized gene in each locus into a pathway analysis program...

Never forget that, ultimately, the most important question is, "what will you use the gene list for." And, are there pathways that you definitely expect to find. If you do, and one approach does not identify any of the "positive control pathways" so to speak, it may not have been a good method for gene selection. Of course you can't know that a priori, but if you have good quality data and you iterate your analysis enough, in my experience you can usually leverage available data to solidify your choices.