You can use the bedmap
tool in the BEDOPS suite to quickly and efficiently generate an answer.
Let's say you have your features in a file called Features.bed
and your genes in a file called Genes.bed
. Also, the ID column of Genes.bed
contains the name of the gene, following UCSC BED format conventions.
To first extend the coordinates of Features.bed
by, say, 100 bases, use the --range
operator with bedmap
. Input that looks like this:
Chr1 2292052 2292086 AE016830.1 NA -
Chr1 2771733 2771767 AE016830.1 NA -
...
will look like this, when operated on (i.e., when searching for overlapping elements from Genes.bed
) with a range adjustment of --range 100
:
Chr1 2291952 2292186 AE016830.1 NA -
Chr1 2771633 2771867 AE016830.1 NA -
...
If you want to specify a non-symmetric padding, use --range L:R
, replacing L
and R
with desired "left" and "right" integer values. You can use negative values to "shrink" or shift elements, as well. See the documentation for more details.
To find the genes contained within these padded ranges, use the following bedmap
command to redirect output to a file called Answer.bed
:
$ bedmap --echo --echo-map-id --range 100 Features.bed Genes.bed > Answer.bed
Output will look something like:
Chr1 2292052 2292086 AE016830.1 NA -|gene-4204;gene-19383
Chr1 2771733 2771767 AE016830.1 NA -|gene-20043;gene-20199
...
The pipe character (|
) delimits unpadded features
from genes
which overlap padded features. The default overlap criterion between padded feature and gene is one or more bases. This overlap can be adjusted; see the documentation for more details.
Note: If you need the feature output to reflect padding, use bedops --range
on the features and then pipe the result to bedmap
, e.g.,:
$ bedops --range 100 --everything Features.bed | bedmap --echo --echo-map-id - Genes.bed > Answer.bed
Output in this case would show adjusted reference coordinates:
Chr1 2291952 2292186 AE016830.1 NA -|gene-4204;gene-19383
Chr1 2771633 2771867 AE016830.1 NA -|gene-20043;gene-20199
...
Please see the documentation for an explanation of the --range
operator with bedops
.
wow, great list of answers!