Create Random Bed File
1
Hello everyone,
Is there a rapid way to create a random BED file taking in consideration the genome version using MySQL access to UCSC ? No criteria here except the length of the fragments that we can call L
Thanks
Rad
ucsc
bed
• 2.9k views
the table chromInfo contains the length of the chromosomes. In the following query I only use another table (kgXref) to amplify the number of rows.
$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
set @rank:=0;
select chrom,
@start:=ROUND(RAND()*(size-100)),
@start+ROUND(RAND()*100),
concat("ID",@rank:=@rank+1),
ROUND(RAND()*1000),
IF(RAND()<0.5,"+","-")
from
chromInfo, kgXref
limit 100'
+-----------------------+-----------+-----------+-------+-----+---+
| chr1 | 90205667 | 90205686 | ID1 | 853 | - |
| chr2 | 228928707 | 228928768 | ID2 | 212 | + |
| chr3 | 110648557 | 110648565 | ID3 | 707 | + |
| chr4 | 78452996 | 78453009 | ID4 | 432 | - |
| chr5 | 92801311 | 92801339 | ID5 | 859 | + |
| chr6 | 119732169 | 119732182 | ID6 | 566 | + |
| chr7 | 72222004 | 72222102 | ID7 | 530 | - |
| chrX | 153234775 | 153234854 | ID8 | 980 | - |
| chr8 | 108715435 | 108715445 | ID9 | 290 | + |
| chr9 | 117733050 | 117733125 | ID10 | 227 | - |
| chr10 | 108812567 | 108812599 | ID11 | 214 | + |
| chr11 | 113708758 | 113708850 | ID12 | 76 | - |
| chr12 | 115213268 | 115213313 | ID13 | 672 | + |
| chr13 | 1854783 | 1854789 | ID14 | 260 | + |
| chr14 | 86200594 | 86200660 | ID15 | 913 | - |
| chr15 | 12437564 | 12437653 | ID16 | 87 | - |
| chr16 | 51086633 | 51086686 | ID17 | 958 | + |
| chr17 | 8681776 | 8681871 | ID18 | 413 | + |
| chr18 | 68530992 | 68531064 | ID19 | 963 | - |
| chr20 | 25333986 | 25333989 | ID20 | 968 | - |
| chrY | 46155598 | 46155666 | ID21 | 57 | + |
| chr19 | 4811141 | 4811207 | ID22 | 35 | + |
| chr22 | 48293614 | 48293622 | ID23 | 566 | - |
| chr21 | 13476195 | 13476256 | ID24 | 227 | + |
| chr6_ssto_hap7 | 3905893 | 3905901 | ID25 | 15 | - |
| chr6_mcf_hap5 | 742068 | 742093 | ID26 | 782 | + |
| chr6_cox_hap2 | 2243145 | 2243230 | ID27 | 854 | - |
| chr6_mann_hap4 | 62703 | 62795 | ID28 | 570 | + |
| chr6_apd_hap1 | 3244946 | 3244972 | ID29 | 212 | + |
| chr6_qbl_hap6 | 3292785 | 3292861 | ID30 | 656 | - |
| chr6_dbb_hap3 | 136354 | 136369 | ID31 | 662 | - |
| chr17_ctg5_hap1 | 524967 | 525065 | ID32 | 976 | - |
| chr4_ctg9_hap1 | 436008 | 436097 | ID33 | 246 | - |
| chr1_gl000192_random | 19603 | 19654 | ID34 | 462 | - |
| chrUn_gl000225 | 96553 | 96651 | ID35 | 533 | - |
| chr4_gl000194_random | 231 | 316 | ID36 | 230 | - |
| chr4_gl000193_random | 68718 | 68816 | ID37 | 812 | + |
| chr9_gl000200_random | 29815 | 29859 | ID38 | 726 | + |
| chrUn_gl000222 | 67855 | 67944 | ID39 | 367 | + |
| chrUn_gl000212 | 131049 | 131052 | ID40 | 30 | + |
| chr7_gl000195_random | 46631 | 46638 | ID41 | 577 | - |
| chrUn_gl000223 | 119311 | 119338 | ID42 | 379 | + |
| chrUn_gl000224 | 43641 | 43740 | ID43 | 208 | + |
| chrUn_gl000219 | 136897 | 136956 | ID44 | 651 | + |
| chr17_gl000205_random | 86356 | 86357 | ID45 | 542 | - |
| chrUn_gl000215 | 145143 | 145156 | ID46 | 115 | + |
| chrUn_gl000216 | 104561 | 104608 | ID47 | 504 | + |
| chrUn_gl000217 | 17846 | 17861 | ID48 | 438 | - |
| chr9_gl000199_random | 66489 | 66562 | ID49 | 498 | + |
| chrUn_gl000211 | 154678 | 154757 | ID50 | 176 | - |
| chrUn_gl000213 | 162227 | 162270 | ID51 | 191 | - |
| chrUn_gl000220 | 119794 | 119866 | ID52 | 354 | - |
| chrUn_gl000218 | 7951 | 7989 | ID53 | 756 | - |
| chr19_gl000209_random | 147343 | 147414 | ID54 | 792 | - |
| chrUn_gl000221 | 111365 | 111378 | ID55 | 494 | + |
| chrUn_gl000214 | 129148 | 129192 | ID56 | 382 | - |
| chrUn_gl000228 | 103840 | 103865 | ID57 | 857 | - |
| chrUn_gl000227 | 5598 | 5663 | ID58 | 114 | - |
| chr1_gl000191_random | 82631 | 82632 | ID59 | 737 | - |
| chr19_gl000208_random | 801 | 812 | ID60 | 532 | + |
| chr9_gl000198_random | 1986 | 2000 | ID61 | 635 | - |
| chr17_gl000204_random | 70716 | 70725 | ID62 | 828 | - |
| chrUn_gl000233 | 40796 | 40879 | ID63 | 459 | - |
| chrUn_gl000237 | 32975 | 32989 | ID64 | 554 | + |
| chrUn_gl000230 | 1828 | 1847 | ID65 | 811 | + |
| chrUn_gl000242 | 969 | 1033 | ID66 | 130 | - |
| chrUn_gl000243 | 11318 | 11330 | ID67 | 807 | - |
| chrUn_gl000241 | 40739 | 40820 | ID68 | 148 | + |
| chrUn_gl000236 | 3592 | 3643 | ID69 | 303 | - |
| chrUn_gl000240 | 40872 | 40967 | ID70 | 840 | + |
| chr17_gl000206_random | 6964 | 7048 | ID71 | 673 | - |
| chrUn_gl000232 | 10439 | 10511 | ID72 | 833 | + |
| chrUn_gl000234 | 20523 | 20576 | ID73 | 149 | + |
| chr11_gl000202_random | 10214 | 10300 | ID74 | 514 | + |
| chrUn_gl000238 | 18495 | 18527 | ID75 | 197 | + |
| chrUn_gl000244 | 22151 | 22220 | ID76 | 798 | - |
| chrUn_gl000248 | 6317 | 6323 | ID77 | 841 | + |
| chr8_gl000196_random | 21150 | 21218 | ID78 | 785 | - |
| chrUn_gl000249 | 440 | 484 | ID79 | 152 | + |
| chrUn_gl000246 | 30097 | 30158 | ID80 | 656 | + |
| chr17_gl000203_random | 13202 | 13239 | ID81 | 800 | - |
| chr8_gl000197_random | 630 | 674 | ID82 | 126 | + |
| chrUn_gl000245 | 8695 | 8717 | ID83 | 387 | + |
| chrUn_gl000247 | 8035 | 8063 | ID84 | 716 | - |
| chr9_gl000201_random | 21819 | 21897 | ID85 | 60 | - |
| chrUn_gl000235 | 23868 | 23923 | ID86 | 652 | - |
| chrUn_gl000239 | 4778 | 4863 | ID87 | 823 | - |
| chr21_gl000210_random | 9750 | 9757 | ID88 | 313 | + |
| chrUn_gl000231 | 21112 | 21196 | ID89 | 880 | - |
| chrUn_gl000229 | 14957 | 14971 | ID90 | 424 | - |
| chrM | 4308 | 4327 | ID91 | 154 | + |
| chrUn_gl000226 | 8416 | 8437 | ID92 | 344 | + |
| chr18_gl000207_random | 1887 | 1985 | ID93 | 518 | - |
| chr1 | 189063477 | 189063558 | ID94 | 752 | + |
| chr2 | 111058172 | 111058198 | ID95 | 911 | - |
| chr3 | 38005235 | 38005296 | ID96 | 451 | + |
| chr4 | 160737381 | 160737470 | ID97 | 918 | - |
| chr5 | 158371323 | 158371383 | ID98 | 370 | + |
| chr6 | 25264328 | 25264386 | ID99 | 477 | - |
| chr7 | 117089472 | 117089550 | ID100 | 686 | + |
+-----------------------+-----------+-----------+-------+-----+---+
Login before adding your answer.
Traffic: 1708 users visited in the last hour
Nice solution, Pierre. Just curious, does mysql have a generator function (like Oracle and others) such that you could create an arbitrary number of random intervals (and thus not have to join to a table like kgXref with a limited number of rows)?
To generate a correct half-open BED file (i.e., to avoid the random possibility that the start and stop coordinates are equal), you might add a
1
to the stop coordinate, e.g.:@start+ROUND(RAND()*100)+1
. Otherwise, this will occasionally generate incorrect BED elements.@Aaron Quick answer: I don't know.
Thx Pierre, brilliant solution like usual