If you can put your data into UCSC BED format, the bedops
tool in the BEDOPS suite will find overlaps between multiple (two or more) BED files.
There are a number of set operations available, but the --not-element-of
operator may be most useful to you. For your application, for example, you might do the following:
$ bedops --not-element-of -1 Reads.bed Genes.bed
This reports all elements of Reads.bed
which do not overlap ranges in Genes.bed
by one or more bases. You can specify custom overlap criteria in either bases or percentage of length.
A couple advantages of bedops
are that it supports multiple BED inputs and standard input. If you have separate files for genes, then you could do the following without an intermediate union:
$ bedops --not-element-of -1 Reads.bed UCSCGenes.bed GENCODEGenes.bed RefSeqGenes.bed ...
In the case of standard input support, you could very easily drop this into the middle of an extended processing pipeline and gain performance benefits from not having to generate intermediate data.
For example, a simple pipeline like this:
$ readGenerator foo bar baz | bedops --not-element-of -1 - Genes.bed > Answer.bed
is generally going to run somewhat faster than:
$ readGenerator foo bar baz > TempReads.bed
$ bedops --not-element-of -1 TempReads.bed Genes.bed > Answer.bed
$ rm TempReads.bed
have you heard of bedtools?
Thanks for quick response... Looks like it is possible with bedtools.. will be using it for the first time! Any quick directions?