I have defined 44 linkage blocks as "interesting" through a GWAS - NGS cohort.
I have run certain tests on those, but I wish to understand how typical or atypical those results are by comparing the 44 loci to 44 similarly size loci that are "matched" according to various things.
1. Matched based on the number of genes and total gene content
2. Matched based on the number of regulatory elements (e.g. Conserved TFBS)
Is there a software that does this? If not, how to people do it?
Thank you.
Alex - thank you very much for this helpful feedback. The data are not currently in BED format, although I suppose I could put them in that format. Actually, the format I would use would likely be VCF file.
However I suppose what I was asking for would be for a tool that would do it without any input at all. A simple example: if I specify a given region that is 1 Mb in length, it might contain, say, 6 genes. The tool I am seeking would simply find another 1Mb region with 6 genes in it.
You could construct one by building sliding 1 Mb windows across your genome, piping them as reference elements into
bedmap
.As an example, this
awk
script makes disjoint 1 Mb windows from 1 Mb to 11 Mb overchr1
and looks within each window to count the number of genes. If the number of genes is equal to 6, then it prints the result:This is a very rudimentary and incomplete example — not least because the windows are disjoint and do not span the chromosome — but hopefully it demonstrates the principle. You could modify this approach to make a sliding window that moves over each chromosome, for instance, testing the window for your condition-of-interest.