Question

Quickly Plotting Distribution Of Points From Bed File?

3

Entering edit mode

11.2 years ago

14134125465346445 ★ 3.6k

I've got about 10e5-10e6 small regions in a bed file and I want to quickly inspect their genome distribution. I expect them to be evenly distributed in all chromosomes, so just plotting the results in something like a Manhattan plot would be useful. Is there anything like that I can use?

Something like:

plot_genomic_distribution --input myfile.bed --output myfile.png

Even something that just plots on a terminal would be good, like a stem and leaves plot vertically over the chromosomal arms.

bed • 9.5k views

ADD COMMENT • link updated 11.2 years ago by Irsan ★ 7.8k • written 11.2 years ago by 14134125465346445 ★ 3.6k

score 5 · Answer 1 · 2013-09-22

5

Entering edit mode

11.2 years ago

Irsan ★ 7.8k

No need for binning the bed-file to plot a distribution if you use famous ggplot2 R library. Have a look at the ggplot2-solution at Plotting Density Of Snps On Chromosomes for plotting a distribution of features (in the example SNPs) across all chromosomes

enter image description here

ADD COMMENT • link 11.2 years ago by Irsan ★ 7.8k

Ram · Answer 2 · 2013-09-21

3

Entering edit mode

11.2 years ago

Alex Reynolds 35k

One option is to map your regions to disjoint windows over a chromosome, counting the number of regions within each window.

To this end, you can use BEDOPS bedmap and its --count operator.

First, decide on chromosome bounds. For instance, for the hg19 build of the human genome, you could use the extents available here in hg19.extents.bed:

Second, make some 10 kb windows with awk and store them in a file called hg19.disjointWindows.bed (or adjust window size, depending on the granularity you want):

$ awk ' \
    BEGIN {\
         windowSize = 10000; \
    } \
    { \
         extentChromosome = $1; \
         extentStart = $2; \
         extentStop = $3; \
         for (windowStart = extentStart; windowStart < extentStop; windowStart += windowSize) { \
             windowStop = windowStart + windowSize; \
             print extentChromosome"\t"windowStart"\t"windowStop; \
         } \
    }' hg19.extents.bed \
    > hg19.disjointWindows.bed

This is guaranteed to be sorted, so we don't need to use any other tools (e.g., sort-bed) to sort this data. So we can use these windows with BEDOPS tools directly.

Third, we prepare your regions. Let's say your regions are in a BED file called unsortedRegions.bed, but we don't know its sort order. So we first sort this data with BEDOPS sort-bed:

$ sort-bed unsortedRegions.bed > regions.bed

Fourth, we count:

$ bedmap --echo-map --count regions.bed hg19.disjointWindows.bed > answer.bed

The file answer.bed will contain data in the following format:

[ window-1 ] | [ number-of-regions-overlapping-window-1 ]
[ window-2 ] | [ number-of-regions-overlapping-window-2 ]
. . .
[ window-n ] | [ number-of-regions-overlapping-window-n ]

Where each [ window-i ] is the i-th element in hg19.disjointWindows.bed, and [ number-of-regions-... ] is the count of overlapping regions over that window.

If you want to split this file further by chromosome, use BEDOPS bedextract. In a bash shell, this is a very fast one-liner:

$ for chr in `bedextract --list-chr answer.bed`; do bedextract $chr answer.bed > $chr.answer.bed; done

You can then bring answer.bed or its separated chromosome files into R or gnuplot, plotting a log-scaled histogram or visualizing it in any number of other ways.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 11.2 years ago by Alex Reynolds 35k

0

Entering edit mode

Hi Alex,

Thanks for posting this, wondering about the way you generated the extents file, I need to do essentially the same thing for mm10.

Rob.

ADD REPLY • link 7.5 years ago by rbronste ▴ 420

1

Entering edit mode

Could use Kent utilities:

$ fetchChromSizes mm10 | awk '{ print $1"0\t"$2 }' > mm10.extents.bed

ADD REPLY • link 7.5 years ago by Alex Reynolds 35k

0

Entering edit mode

Yes that works well, weirdly though I get an empty mm10.disjointWindows.bed file after running the awk script.

ADD REPLY • link 7.5 years ago by rbronste ▴ 420

0

Entering edit mode

Yeah, not sure why that is. Maybe do:

$ fetchChromSizes mm10 | head

and see what you get.

ADD REPLY • link 7.5 years ago by Alex Reynolds 35k

0

Entering edit mode

Ah, sorry, I misread. I'm not sure why you get an empty result. I'd break things down and use head to see what comes out at various stages.

ADD REPLY • link 7.5 years ago by Alex Reynolds 35k

1

Entering edit mode

Another option is to use bedops --chop to make the 10k windows, instead of awk:

$ bedops --chop 10000 mm10.extents.bed > mm10.disjointWindows.bed

This option was added after I wrote the initial answer.

ADD REPLY • link 7.5 years ago by Alex Reynolds 35k

score 1 · Answer 3 · 2013-09-21

1

Entering edit mode

11.2 years ago

Devon Ryan 104k

You could load the bed file into a GRanges object and then just use plotGrandLinear from ggbio to get a manhattan plot.

ADD COMMENT • link 11.2 years ago by Devon Ryan 104k