Question

Plot of intergenomic distances between all bound TF sites?

0

Entering edit mode

9.7 years ago

bede.portz ▴ 540

I would like to plot the distance between all pairs of peaks/bound locations for a specific transcription factor. In other words, generate a histogram of inter-genomic distances between all bound locations. Essentially a composite plot of the data, but with the bound location being both the reference point and the data being plotted.

My rationale is that for a particular factor, it appears that bound locations are very often clustered with other bound locations within a few Kb. Generating the plot I mentioned may reveal if there is some preferential range of distances between bound locations for this particular factor, which could be compared to the intergenomic distances of other related factors and TSSs, and to that of the estimated random distribution.

Is there a tool to do this?

ChIP-Seq peaks • 2.5k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.7 years ago by bede.portz ▴ 540

0

Entering edit mode

It sounds like bedtools closest plus awk would work. Have you given that a try?

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Devon Ryan 105k

Ram · Answer 1 · 2015-11-12

3

Entering edit mode

9.7 years ago

Alex Reynolds 36k

You could use BEDOPS closest-features --dist --closest --no-overlaps --no-ref on a sorted BED file of regions-of-interest ("roi"), feeding the resulting list of signed distances into R and hist() to generate a histogram. You will need to take the absolute value of values with abs(), to deal with negative values before plotting a histogram.

At the command-line:

$ closest-features --dist --closest --no-overlaps --no-ref roi.bed roi.bed \
    | cut -d '|' -f2 - \
    > signed_distances.txt

In R:

> v.signed <- scan("signed_distances.txt")
> v.unsigned <- abs(v.signed)
> hist(v.unsigned)

You could repeat this procedure on any set of regions-of-interest, such as those from other factors, or similarly-sized intervals sampled from a genomic background that makes sense for your experiment (e.g., the entire genome minus repeatmasked regions, etc.).

If you want to compare distributions of distances and assign statistical significance to the comparison, you might use a K-S test (ks.test()) or chi-squared test (chisq.test()) on the unbinned distances.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Alex Reynolds 36k

0

Entering edit mode

Alex, Thanks for the response. It appears from a cursory look at the documentation that closest features wants two input files, can I run it with just the one input file? I.e. the bound intervals for a given factor?

Thanks

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by bede.portz ▴ 540

1

Entering edit mode

Take a look at the example. You still use two inputs, but specify the same filename for both inputs. This makes the application look for the nearest distance between each pair of non-overlapping elements within your lone input file. Just make sure your BED-formatted input file is sorted per BEDOPS sort-bed before running closest-features.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Alex Reynolds 36k