We have ATAC-seq and RNA-seq data from two sample groups (matched n=3 for each assay and group), which we used for differential analysis with rather stringent criteria (FDR < 1%, null hypothesis towards fold changes > |2| using glmTreat
in edgeR
). The task now is to assign differential ATAC-seq regions to differentially-expressed genes (DEG).
The naive approach I was trying is to assign each diff. ATAC-seq peak to the next differentially-expressed gene given it was in the same topologically-associating domain (TAD) from a closely-related cell type. Everything basically done with a combination of BEDtools intersect
and closest
.
Doing so, 75% of diff. ATAC could be assigned to a DEG. Distance to next DEG (in kb) as follows (quantiles):
10% 25% 50% 75% 90% 95% 99%
0.0000 0.8330 47.4660 162.1232 362.3022 546.6986 1038.2000
Would you put trust in this kind of naive assignment? How do you typically approach this task?
There is a tool InTAD
at BioC for enhancer/gene assignment but from the paper I understand that n=3 per group (so 6 total) are not really powerful for its correlation-based approach, therefore tried the above approach first.
I am aware that these kinds of assignments without additional data from C-technologies (HiC, 4/5C-seq etc.) have quite a high rate of false assignments, still this is what we have so far. I would especially interested in your experience with these kinds of assignments.
Suggestions appreciated.
I would also consider to split the regulation of the gene expression not only by absolute values but check if the up-regulated get more ATAC calls (and vice versa for the downregulted genes). In principle the approach is good IMO as many regulatory regions are within the 50kb range of the promoter. However, of course you will miss (potential meaningful) interactions, but without your mentioned C-technologies it might be difficult to call those. You could also consider to use ChIP-seq to show an enrichment with for example transcription factor occupancy - this makes it quite likely that those in the vicinity of your genes will contribute to its transcriptional regulation.