I tried uploading ~ 100 genes of our interest and asked the Nimblegen algorithm to crunch out the information of their probes. The program provides a list of 12000 "probes" (covering 80% of the region of our interest). Below is part of their output (in BED format). When I calculated the length of their tiling region, I found that the regions vary from ~ 60 base (smallest) to ~ 10000 base (longest). My questions are
for the long tiling regions (longer than, say, 5000 bases), is it correct to assume there are probes that cover all bases, (probes overlapping or not) ?
why are some smaller tiling regions (e.g. see chr 6) still immediately adjacent to one another ( say, less than 20 bases apart) ? For a few regions that I checked, there does not seem to contain any repeated sequences within those 10-20 base "gaps". Any
reason/explanation why they "separate" those immediately adjacent
regions into distinct regions, rather than consolidate them into
"longer tiling regions".chr9 88871247 88904560 chr15 75023162 75057715 chr19 49465156 49474739 chr22 39914287 39922198 chr1 45476925 45482110 chr10 127464902 127512525 chr20 6698338 6766164 track name=100000_XYZ_tiled_region description="100000_XYZ_P1_tiled_region" chr1 45476880 45482139 chr9 88871198 88871815 chr9 88871917 88872988 chr9 88873272 88873957 chr9 88874062 88874813 chr9 88875022 88875564 chr9 88875565 88875782 chr20 6736541 6736791 chr20 6736792 6740517 chr20 6740706 6741800 chr20 6741803 6744423 chr20 6744558 6744663 chr20 6744668 6746245 chr20 6746248 6746637 chr20 6746638 6747943 chr20 6747943 6749797 chr20 6749818 6750266 chr20 6750283 6754517 chr20 6754518 6754891 chr20 6755188 6759761 chr20 6759803 6759868 chr20 6759888 6762971 chr20 6762978 6763479 chr20 6763483 6765147 chr20 6765153 6765505