I wonder if there is any tool or script can generate random non overlapped bed co-ordinaes compare to given input bed co-ordinates.
~Chirag
I wonder if there is any tool or script can generate random non overlapped bed co-ordinaes compare to given input bed co-ordinates.
~Chirag
Hi Chirag,
In order to have a non-overlapping set, you can use bedtools subtractBed and the corresponding chromosome sizes. You'll get a bed file which consists of the chromosome minus the input. From this, you can use for instance R to sample smaller chunks.
Cheers,
Michael
Set your build of interest:
$ BUILD="hg19"
$ echo ${BUILD}
hg19
Set the number of elements you want to sample from ${BUILD}
:
$ ELEMENTS=1234
$ echo ${ELEMENTS}
1234
Then sample with mysql
, sort the BED data with sort-bed
, map with bedmap
to count the number of overlapping elements, use awk
to filter for elements that only overlap themselves, use cut
to strip the first column, and then write the results to a new BED file called random.bed
:
$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D ${BUILD} -e 'SET @rank:=0; SELECT DISTINCT chrom as chromcol, @start:=ROUND(RAND()*(size-100)) as startcol, @start+ROUND(RAND()*100)+1 as stopcol, CONCAT("id-",@rank:=@rank+1) as idcol, ROUND(RAND()*1000) as scorecol, IF(RAND()<0.5,"+","-") asstrandcol FROM chromInfo, kgXref LIMIT ${ELEMENTS}' | sort-bed - | bedmap --count --echo --delim '\t' - | awk '$1==1' - | cut -f2- - > random.bed
This will generate a subset of ELEMENTS
number of BED elements, which are between 1 and 100 bases long, within the chromosomal boundaries of the genome build BUILD
, where they are non-overlapping.
You can adjust that size parameter, depending on the region size distribution you need for randomly-sampled elements. You'll get a subset, because not all elements may be disjoint.
Once you have random.bed
, you can apply set operations against regions of interest with bedops
, etc.
Random genome fragments in RSA tools can do this. It won't exclude overlapping regions though so you'll have to filter out afterwards.
If I understood the question correctly, Bedtools shuffle should do exactly that.
cat A.bed
chr1 0 100 a1 1 +
chr1 0 1000 a2 2 -
cat my.genome
chr1 10000
chr2 8000
chr3 5000
chr4 2000
bedtools shuffle -i A.bed -g my.genome
chr4 1498 1598 a1 1 +
chr3 2156 3156 a2 2 -
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Michael,
Worked very well.
Cheers, Chirag