Generate bed co-ordinates other then given input bed file
4
0
Entering edit mode
8.5 years ago
Chirag Parsania ★ 2.0k

I wonder if there is any tool or script can generate random non overlapped bed co-ordinaes compare to given input bed co-ordinates.

~Chirag

bed bedtools • 2.0k views
ADD COMMENT
1
Entering edit mode
8.5 years ago
michael.ante ★ 3.9k

Hi Chirag,

In order to have a non-overlapping set, you can use bedtools subtractBed and the corresponding chromosome sizes. You'll get a bed file which consists of the chromosome minus the input. From this, you can use for instance R to sample smaller chunks.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thanks Michael,

Worked very well.

Cheers, Chirag

ADD REPLY
0
Entering edit mode
8.5 years ago

Set your build of interest:

$ BUILD="hg19"
$ echo ${BUILD}
hg19

Set the number of elements you want to sample from ${BUILD}:

$ ELEMENTS=1234
$ echo ${ELEMENTS}
1234

Then sample with mysql, sort the BED data with sort-bed, map with bedmap to count the number of overlapping elements, use awk to filter for elements that only overlap themselves, use cut to strip the first column, and then write the results to a new BED file called random.bed:

$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D ${BUILD} -e 'SET @rank:=0; SELECT DISTINCT chrom as chromcol, @start:=ROUND(RAND()*(size-100)) as startcol, @start+ROUND(RAND()*100)+1 as stopcol, CONCAT("id-",@rank:=@rank+1) as idcol, ROUND(RAND()*1000) as scorecol, IF(RAND()<0.5,"+","-") asstrandcol FROM chromInfo, kgXref LIMIT ${ELEMENTS}' | sort-bed - | bedmap --count --echo --delim '\t' - | awk '$1==1' - | cut -f2- - > random.bed

This will generate a subset of ELEMENTS number of BED elements, which are between 1 and 100 bases long, within the chromosomal boundaries of the genome build BUILD, where they are non-overlapping.

You can adjust that size parameter, depending on the region size distribution you need for randomly-sampled elements. You'll get a subset, because not all elements may be disjoint.

Once you have random.bed, you can apply set operations against regions of interest with bedops, etc.

ADD COMMENT
0
Entering edit mode
8.5 years ago
jotan ★ 1.3k

Random genome fragments in RSA tools can do this. It won't exclude overlapping regions though so you'll have to filter out afterwards.

ADD COMMENT
0
Entering edit mode
8.5 years ago
A. Domingues ★ 2.7k

If I understood the question correctly, Bedtools shuffle should do exactly that.

cat A.bed
chr1  0  100  a1  1  +
chr1  0  1000 a2  2  -

cat my.genome
chr1  10000
chr2  8000
chr3  5000
chr4  2000

bedtools shuffle -i A.bed -g my.genome
chr4  1498  1598  a1  1  +
chr3  2156  3156  a2  2  -
ADD COMMENT

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6