Question

Histogram showing overlap between 2 bed files

1

Entering edit mode

7.0 years ago

newscient ▴ 20

I'm trying to plot the counts of the overlap of the regions of 1 bed file on the regions of another bed file (the second file has same width regions in different parts of the genome) in a histogram like plot in base pair resolution. But what ever I have tried (using bedtools/bedops - binning,counting and windows- e.g. this suggestion for motifs Which tool can plot the enrichment of specified motif across ChIP-seq bed files? ) doesn't seem to give me something informative. Is there an easy work around the problem? Thanks in advance

UPDATE: still not what I am looking for given in the answers!

Trying to explain it again: 2 bed files trying to get the counts of overlaps of one bed file to the other's file regions. Then plot the count frequency of overlaps (not sure if i should call that density) in a base-pair resolution on y-axis and bases (e.g. 0-300) depicting the regions from the second bed file on x-axis.

histogram overlap • 4.5k views

ADD COMMENT • link 7.0 years ago by newscient ▴ 20

1

Entering edit mode

Why not just obtain the per-base read-depth using BEDTools and then plot the profile in R Programming Language. Create a 'consensus' BED file first and, then, where the regions don't overlap, a value of 0 will be assigned.

In this way, you can also summarise each region by mean, as I've done here:

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Could you provide the R code for getting such a mean depth of coverage graph out of the bed file generated by bedtools -coverage?

ADD REPLY • link 7.0 years ago by newscient ▴ 20

2

Entering edit mode

Sure thing. I would first merge your 2 BED files into one using BEDTools merge, then get the per-base read-depth in each sample over this 'consensus' BED and output to bedgraph format. Then run something like:

dfPerbaseDepth1 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2[5] <- dfPerbaseDepth2[,5]/2

pdf("test.pdf", width=7, height=3)
    par(mar=c(2,2,2,2), cex=1.0)
    plot(dfPerbaseDepth1[,5], col="forestgreen", main="", type="l", xlab="", ylab="Per base read depth")
    lines(dfPerbaseDepth2[,5], col="royalblue")    
dev.off()

NB - note that here I'm dividing the read-depth in my second file by 2 just for visualisation (it's the same sample that i'm plotting).
NB - you'll have to sort out the region labeling yourself - apologies
NB - you can summarise each BED region by mean with the BEDTools coverage tool to which I've linked above (just add -mean as a command-line parameter)
NB - to change the plot type, explore the type parameter to plot()

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k

score 0 · Answer 1 · 2017-11-11

If you're working with hg38, you could do something like the following to get counts of reads from a BAM file in 50kb bins (requires UCSC Kent tools and BEDOPS):

$ fetchChromSizes hg38 | awk -v OFS="\t" '($0!~/.*_.*/){print $1, "0", $2; }' | sort-bed - > hg38.bed
$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) <(bam2bed < reads.bam) > hg38.50kbReadDensity.bed

You could adjust the width of this window by changing 50000 to some other value.

If you have two BED files, one of which that contains signal in the score column, you could instead do this:

$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) reads.bed > hg38.50kbReadDensity.bed

Either way, you could then split this file by chromosome:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; bedextract ${chr} hg38.50kbReadDensity.bed > hg38.50kbReadDensity.${chr}.bed; done

Then plot the signal in each per-chromosome file to a per-chromosome PDF:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; ./plotDensity.Rscript hg38.50kbReadDensity.${chr}.bed hg38.50kbReadDensity.${chr}.pdf "50kb Read Density - ${chr}"; done

The plotDensity.Rscript script could look like this:

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
inFn = args[1]
outFn = args[2]
title = args[3]
d <- read.table(inFn, col.names=c("chr", "start", "stop", "density"), sep="\t", colClasses=c("character", "numeric", "numeric", "numeric"))
xRange <- range(0, d$stop) 
yRange <- range(0, d$density)
pdf(outFn)
plot(xRange, yRange, type="n", xlab="Position", ylab="Density", main=title)
lines(d$start, d$density)
dev.off()