I have a list of Active Enhancer peaks (bed file) and initially I wanted it assign them to closest target gene. I used Bedops's closest-features to do it. Active Enhancers:
chr1 34737 34937 7_Weak_Enhancer 0 . 34737 34937 255,252,4
chr1 35737 35937 7_Weak_Enhancer 0 . 35737 35937 255,252,4
chr1 36137 36337 7_Weak_Enhancer 0 . 36137 36337 255,252,4
Active Enhancer Annotated to Nearest Gene:
chr1 34737 34937 7_Weak_Enhancer 0 . 34737 34937 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
chr1 35737 35937 7_Weak_Enhancer 0 . 35737 35937 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
chr1 36137 36337 7_Weak_Enhancer 0 . 36137 36337 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
Bedops command used was:
closest-features --closest sortFinalEnhancer.txt sorted10kGTF.bed > ActiveEnhancerCatalog.bed
Now I want to assign each Enhancer to all the genes within 20 kb range (not the nearest gene within 20 kb range but all of them which are in 20 kb range). For example, the output I want will look like this (for first enhancer peak):
chr1 34737 34937 7_Weak_Enhancer 0 . 34737 34937 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
chr1 34737 34937 7_Weak_Enhancer 0 . 34737 34937 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
chr1 34737 34937 7_Weak_Enhancer 0 . 34737 34937 255,252,4|chr1 24610 44610 NR_026820 FAM138F -
So the 3 different genes which were mapped 1-1 before (because of nearest gene criteria within 10 kb region) are now mapped against 1 enhancer peak because they all fall with 20 kb range.
Can anyone guide me if there is any Bedops option or any other possible option to do it? Thank you.
Thanks a lot Alex for replying. I have a little problem, I already tried this way as you suggested in your answer in old question but the problem is I get all the overlaps in one line separated by
;
. While I want the answer in the format as mentioned above in the question. Currently what I'm getting for my sample input file is:chr1 500 1000 Enh1 3.45 5.67|chr1 1500 2500 NR_9088 ATF2 -;chr1 11000 12000 NR_879 ATF1 +;chr1 19000 21000 NR_876 ATF1 +
while what I want is:You can pipe to
awk
to reformat the output, e.g.:This could blow up the file, if you have a lot of overlaps. I suggest using standard input and output streams where you can, to minimize file sizes or the writing of intermediate files.
Yes, it works. A little problem left is: I'm trying to use
--delim '\t'
after the--echo-map
but it is not working. can you guide me if I'm doing something wrong here? This is what I am trying:bedmap --range 20000 --echo --echo-map --delim '\t' enhancers.bed genes.bed | awk -F'|' '{ reference = $1; mappings = $2; n = split(mappings, m, ";"); for(i = 1; i <= n; i++) { print reference"|"m[i]; } }'
I can do it by using
sed -i 's/|/\t/g' answer.bed
but I was wondering why the--delim "\t"
is not working !!If you switch the
--delim
character from the default pipe character (|
) to a tab character (\t
) then you will no longer be able to use|
inawk
as a field separator between the reference element (reported by--echo
) and the mapped elements (reported by--echo-map
).You could use
sed
at the very end, but you probably want to keep that|
character around until then, so that you have a delimiter between reference and mapped elements that is something other than a tab character. This is because the delimiter between fields inside a reference element and inside one or more mapped elements is also a tab character, and so that makes separation more difficult.