Generate Random Genomic Regions
3
2
Entering edit mode
11.3 years ago
ChIP ▴ 600

Hi!

I have a peakfile (file containing genomic regions in BED format) containing 1000 regions from hg18.

What I would like, is to generate a random set of 1000 peaks from hg 18 of nearly same size and type (with respect to their position in genome promoter/exon/intron/intergenic).

I am almost sure all the guys working in motif discovery have encountered such a problem.

Kindly help

Thank you

chip-seq • 9.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

True, but this question has a novel slant; ChIP requires that the distances between genomic regions and nearby genes be maintained.

ADD REPLY
6
Entering edit mode
11.3 years ago
KCC ★ 4.1k

There is a function in the bedtools package called shuffleBED.

It generates random regions with the same size distribution as your original list of peaks.

You can specify that randomly generated set of features come from the same chromosome as the originals, and you can specify regions that should not contain peaks such as intergenic regions.

.

ADD COMMENT
0
Entering edit mode

@ George: how about a small example of the command, the way I am using shuffleBed is: shuffleBed -i Peakfile -g genome_table.human.hg18.txt> test ...... Now what more to add ???? I get same number of peaks but the genomic distribution is entirely different. For instance Peakfile has 700 peaks in promoter while the test file generated has only 200 peaks in promoter. Thank you

ADD REPLY
0
Entering edit mode

The randomly generated peaks are going to match the distribution of your non-excluded region. If you want to match the number in promoters etc, I think you would need to break your peaks into ones in each category (promoter, exon , intron etc) and create specific exclusion files for each category. I don't know if there is a less ugly way to do this, but this is what comes to mind.

ADD REPLY
1
Entering edit mode
11.3 years ago
Ian 6.1k

You might be interested in a script generate_background_sequences.py) that comes with GimmeMotifs, which can extract random sequences maintaining the same same distance to a random gene "matched genomic background". If I remember correctly you would need to compile GimmeMotifs first to get access to the required python libs.

ADD COMMENT
0
Entering edit mode
5.8 years ago
Andrewoods ▴ 110

You can try the seqbias R package. There is a function random.intervals.
link: https://bioconductor.org/packages/release/bioc/html/seqbias.html

ADD COMMENT

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6