Annotating narrow.peak to promoter region using GenomicFeatures
1
0
Entering edit mode
4.6 years ago
solaris • 0

Hi, I'm trying to find out in which promoters a certain histone modification is located. I retrieved a .narrowPeak from ChIP Hub (Experiment SRX352374). When I use the GenomicFeatures packages to retrieve the promoter sequence using the "promoters( )" function, I used either 500 bp or 1000 bp upstream of TSS. On the first look, it seems to work just fine. But when comparing these two approaches, I found some genes (36) only in the list retrieved by the 500 bp selection, but not in the 1000 bp selection. When looking up these genes in IGV with the corresponding Peak track, I also found some of the peaks located downstream of the gene, not upstream as chosen.

enter image description here

According to the vignette, the promoters( ) function includes strandness. Does anyone have a clue how to tackle these problems? Or another procedure to extract gene list with the which have peaks in their promoters? Thank you.

SRX352374_1000<-import("H2A.W, SRX352374.idr.narrowPeak", ...)
txdb_t8_prom1000<-promoters(txdb_t8, upstream=1000, downstream = 10)
hits_SRX352374_1000<-findOverlaps(txdb_t8_prom, SRX352374_1000)

ranges(SRX352374_1000)[subjectHits(hits_SRX352374_1000)] <- ranges(txdb_t8_prom)[queryHits(hits_SRX352374_1000)]
 SRX352374_1000ann<-transcriptsByOverlaps(txdb_t8, SRX352374_1000)
 SRX352374_1000ann<-elementMetadata(SRX352374_1000ann)

#same procedure for 500 bp Promoter
SRX352374_500<-import("H2A.W, SRX352374.idr.narrowPeak",..)
txdb_t8_prom500<-promoters(txdb_t8, upstream=500, downstream = 10)

hits_SRX352374_500<-findOverlaps(txdb_t8_prom500, SRX352374_500)
ranges(SRX352374)[subjectHits(hits_SRX352374_500)] <- ranges(txdb_t8_prom_500)[queryHits(hits_SRX352374_500)]
SRX352374_500ann<-transcriptsByOverlaps(txdb_t8, SRX352374_500)
SRX352374_500ann<-elementMetadata(SRX352374_500ann)

SRX352374_overlap_wrong<-anti_join(SRX352374_500ann, SRX352374_1000ann)
#36 genes only found in 500 bp selection
R version 3.6.3 GenomicFeatures_1.36.4  GenomicRanges_1.36.1 IRanges_2.18.3
ChIP-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

I found an easier and fully working solution instead of using findOverlaps( ) :

SRX352376_500<-subsetByOverlaps(txdb_t8_prom500, SRX352374_500)

ADD REPLY
1
Entering edit mode
4.6 years ago

For time being, you can try closestBed if you have TSS coordinates. This way you have more control.

cat tss.bed

chr1    123 124 geneA   .   +
chr1    200 201 geneB   .   -

cat peaks.bed

chr1    523 823 peakA
chr1    800 1100    peakB

.

closestBed -D b -a peaks.bed -b tss.bed

chr1    523 823 peakA   chr1    200 201 geneB   .   -   -323
chr1    800 1100    peakB   chr1    200 201 geneB   .   -   -600

select by distance and direction:

closestBed -D b -a peaks.bed -b tss.bed  | awk -v OFS="\t" '{ if (sqrt($NF^2) <500) print $4,$8}'

peakA   geneB

sqrt is to get absolute distance irrespective of direction.

ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6