Merge overlapping and adjacent features in the BED file having the same label in the name (4-th) column
2
0
Entering edit mode
4.1 years ago
Denis ▴ 310

Hi there!

My BED file looks like:

chr1   10   20   A
chr1   15   20   B
chr1   19   30   A
chr1   10   20   C
chr1   21   30   C

I'd like to merge overlapping or adjacent (i.e. having just a 1bp distance) features with the same label in the name (the 4-th) column of the BED file to get in result:

chr1   10   30   A
chr1   15   20   B
chr1   10   30   C

I've found a bedtools merge utilite, but it does not take a label into account when try to merge features in the BED file.

Thanks!

genome R • 2.3k views
ADD COMMENT
0
Entering edit mode

Split by "label" then reduce.

ADD REPLY
0
Entering edit mode

I'm wondering which tool i can use to do that?

ADD REPLY
3
Entering edit mode
4.1 years ago
cut -f 4 input.bed | sort | uniq | while read C
do
     awk -v C=${C} '($4==C)' input.bed | sort -t $'\t' -k1,1 -k2,2n | bedtools merge >> result.bed
done
ADD COMMENT
2
Entering edit mode
4.1 years ago

BEDOPS bedmap + bash + awk:

$ bedmap --echo-map-range --echo-map-id-uniq --delim '\t' <(awk -v FS="\t" -v OFS="\t" '{ id=$4; $4=$1; $1=id; print $0; }' in.bed | sort-bed - | bedops --range 1 --merge -) <(awk -v FS="\t" -v OFS="\t" '{ id=$4; $4=$1; $1=id; print $0; }' in.bed | sort-bed -) | awk -v FS="\t" -v OFS="\t" '{ chrom=$4; $4=$1; $1=chrom; print $0; }' | sort-bed -
chr1    10  30  A
chr1    10  30  C
chr1    15  20  B
ADD COMMENT
0
Entering edit mode

Thanks for your quick reply! Just because i already have a bedtools installed on my PC, i've used solution suggested by Pierre Lindenbaum.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6