Identify Target genes of an enhancer
1
1
Entering edit mode
8.4 years ago

I have the peak file of Active Enhancer mark (H3K27ac). I have decided the 100kb region up and down stream of TSS to link genes to the Enhancer. My peak file is as below:

chr8    145701383   145702686   Rank_1  88  .   5.97704 12.09649    8.84716
chr20   33850820    33852880    Rank_2  78  .   4.68799 10.87448    7.89185

I want to get a list of genes which are within 100kb region of these peak coordinates. Can anybody guide me how I can do that. I was looking into ChIPseeker package (annotatPeak function) but I was not getting a clue how to do it!

For example if I say first peak the "Enhancer_1" then I want to link all the TSS that lies with in 100 kb region domain with this enhancer, like a column containing all target genes as: Enhancer_1: Gene1, Gene2, Gene3 ... Gene20

Enhancer Peaks • 3.1k views
ADD COMMENT
5
Entering edit mode
8.2 years ago

In R:

> library(Homo.sapiens) # from bioconductor
> library(dplyr) # optional, but for the %>% thing

# The import function from GRanges would be the way to import BED files,
# but in this case doesn't work correctly because there are extra columns
> # import("example.bed", "BED") 
> tf = read.delim("example.bed", header=F, stringsAsFactors=F)  %>% 
     dplyr::rename(chr=V1, start=V2, end=V3) %>% 
     makeGRangesFromDataFrame(., keep.extra.columns=T)

# Get the coordinates of all TSS in human genome. The TxDB object is part of Homo.sapiens.
# To get only one TSS per gene, use the genes() function instead of transcripts()
> human.tss = resize(transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, columns=c('gene_id', 'tx_id')), width=1, fix='start')

# Get TSS within 100000 bp of your tf list:
> mergeByOverlaps(human.tss, tf + 100000)  #see also subsetByOverlaps
DataFrame with 54 rows and 9 columns
                     human.tss         gene_id     tx_id
                     <GRanges> <CharacterList> <integer>
1   chr8:145617095-145617095:+          203054     32948
2   chr8:145660551-145660551:+                     32949
3   chr8:145691738-145691738:+           90990     32950
4   chr8:145697960-145697960:+                     32951
5   chr8:145715417-145715417:+           84988     32952
...                        ...             ...       ...
50   chr20:33872619-33872619:-            3692     71966
51   chr20:33880225-33880225:-          128876     71967
52   chr20:33880225-33880225:-          128876     71968
53   chr20:33894862-33894862:-           55245     71969
54   chr20:33893057-33893057:-                     71979
                  tf...1e.05          V4        V5          V6               V7
                   <GRanges> <character> <integer> <character>      <character>
1   chr8:145601383-145802686      Rank_1        88           . 5.97704 12.09649
2   chr8:145601383-145802686      Rank_1        88           . 5.97704 12.09649
3   chr8:145601383-145802686      Rank_1        88           . 5.97704 12.09649
4   chr8:145601383-145802686      Rank_1        88           . 5.97704 12.09649
5   chr8:145601383-145802686      Rank_1        88           . 5.97704 12.09649
...                      ...         ...       ...         ...              ...
50   chr20:33750820-33952880      Rank_2        78           . 4.68799 10.87448
51   chr20:33750820-33952880      Rank_2        78           . 4.68799 10.87448
52   chr20:33750820-33952880      Rank_2        78           . 4.68799 10.87448
53   chr20:33750820-33952880      Rank_2        78           . 4.68799 10.87448
54   chr20:33750820-33952880      Rank_2        78           . 4.68799 10.87448

EDIT: for some reasons I misread your question at first, so I wrote how to get the closest genes instead. I'll leave this year for legacy.

# Get the closest genes
> neighbor.genes = genes(TxDb.Hsapiens.UCSC.hg19.knownGene)[nearest(tf, genes(TxDb.Hsapiens.UCSC.hg19.knownGene))]

# Get the identifier of the original TF, so you can know which TF is closest to each gene:
> mcols(neighbor.genes)$tf = tf$V4  

> neighbor.genes
GRanges object with 2 ranges and 2 metadata columns:
        seqnames                 ranges strand |     gene_id          tf
           <Rle>              <IRanges>  <Rle> | <character> <character>
   8928     chr8 [145699115, 145701718]      - |        8928      Rank_1
  55741    chr20 [ 33703160,  33865960]      - |       55741      Rank_2
  -------
GRanges object with 2 ranges and 1 metadata column:
        seqnames                 ranges strand |     gene_id
           <Rle>              <IRanges>  <Rle> | <character>
   8928     chr8 [145699115, 145701718]      - |        8928
  55741    chr20 [ 33703160,  33865960]      - |       55741
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
ADD COMMENT

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6