Question

Tool to calculate extreme most positions in a bed file for a given window

0

Entering edit mode

10.9 years ago

Aishwarya Kulkarni ▴ 90

Hi I have want to identify the extreme most positions in a bed file from a specific position in a window frame,

E.g. if base position is chrX 154029186 154029187, and if following are the overlapping positions from another bed file in a specified window frame , then the tool should spit out

chrX    154029165    154029172

and

chrX    154028990    154028999

since they are extreme most positions in the frame

chrX     154029186     154029187     chrX     154029165     154029172
chrX     154029186     154029187     chrX     154028981     154028992
chrX     154029186     154029187     chrX     154028991     154029002
chrX     154029186     154029187     chrX     154028981     154028990
chrX     154029186     154029187     chrX     154028991     154029000
chrX     154029186     154029187     chrX     154028982     154028991
chrX     154029186     154029187     chrX     154028990     154028999

bed • 2.7k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 10.9 years ago by Aishwarya Kulkarni ▴ 90

0

Entering edit mode

If I understood clearly then you have two bed files (e.g. "A.bed" and "B.bed") and you want to print only those co-ordinates of A.bed which doesn't overlap with B.bed.

If it is so then just do (install bedtools)

intersectBed -a A.bed -b B.bed > output.bed

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 10.9 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Your best bet is to either script something with pybedtools or with R (in GenomicRanges). If the BED files are large, the former is probably more efficient.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Assuming your bedfiles are sorted correctly you can pipe your intersections into

awk '{if (NR == 1) extreme1 = $0} END {print extreme1"\n"$0}'

but there should be better way solving this.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 10.9 years ago by PoGibas 5.1k

Ram · Answer 1 · 2014-11-05

I think for a simple min-max search (it seems you want the find the row with smallest start and the one with the largest end) a oneliner in awk would work well:

cat data.bed | awk ' BEGIN { min=1E10 } $2 < min { min=$2; min_row=$0 } max < $3 { max =$3; max_row=$0 } END { print min_row; print max_row;}'

but it could be that I misunderstood what you want.

Advice: when you create an example make sure to make it simple, for example use an example with short readable numbers 100, 200 etc rather than something large that is difficult to parse/compare.

Ram · Answer 2 · 2014-11-05

0

Entering edit mode

10.9 years ago

Alex Reynolds 36k

I don't really understand the format of your dataset. However, an alternative is to calculate the distances between target and query elements with awk and write that value to an additional column. Use GNU sort to sort that column and then take the head or tail, depending on whether the minimum or maximum value is needed.

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 10.9 years ago by Alex Reynolds 36k