Hi there, I am currently trying to combine sorted .bed files into a matrix. I am interested in every unique site and do not want to merge locations. Using various commands (--intersect, --merge, --union) in bedops has resulted in the merging of sites. The files have 5 columns> chr/ startbp /endbp /coverage /%methylation
Example- head 1.txt.sorted (first 10 lines)
chr1 10472 10473 4 0
chr1 10479 10480 1 0
chr1 10481 10482 4 0
chr1 10482 10483 1 0
chr1 10485 10486 4 0
chr1 10487 10488 1 0
chr1 10491 10492 1 0
chr1 10495 10496 1 0
chr1 10498 10499 4 0
chr1 10501 10502 1 0
Example- head 2.txt.sorted (first 10 lines)
chr1 10472 10473 0 0
chr1 10479 10480 0 0
chr1 10481 10482 0 0
chr1 10482 10483 0 0
chr1 10485 10486 0 0
chr1 10487 10488 2 0
chr1 10491 10492 2 0
chr1 10495 10496 5 0
chr1 10498 10499 0 0
chr1 10501 10502 7 0
merge .bed 1& 2 Result (incorrect- as doesn't have a line for 10482)
chr1 10472 10473
chr1 10479 10480
chr1 10481 10483
chr1 10485 10486
chr1 10487 10488
chr1 10491 10492
chr1 10495 10496
chr1 10498 10499
Example of some commands I have tried:
Eg. 1 below --> this merges the bed files (-n), then removes sites that are not common (-u), pastes the 4/5th columns into the matrix .....#issue arises in the merge .bed - as sites that are 1bp are merged together?
bedops -u *.txt.sorted > union.bed
bedops -m *.txt.sorted > merge.bed
for fn in `ls *.txt.sorted`; do echo $fn; bedops -n 1 merge.bed $fn | awk '{ print $0"\tNA\tNA" }' | bedops -u - $fn | cut -f 4-5 > $fn.map; done
paste merge.bed *.txt.sorted.map > merged_noncpg_encode.mtx
Eg. 2
bedops --intersect --range 0 *.bed.txt.sorted > Int.mtx
Would be great if someone could provide some guidance as to how I can avoid merging of sites such as those that are 1bp apart. Or send me to a similar Q&A (all my current searches have been unsuccessful)
Thanks so much!
I want to help but I am lost about what you're trying to do. Can you provide a simpler example (and use indents to format code)? What are you really trying to do? What does the output need to look like?
If you just want single-base elements from
N
BED files:You can then map score or ID columns back to the union with
bedmap
, e.g.:Or:
Thanks for the reply Alex. Sorry for the complicated example. Appreciate your reply- which works. I also used the partition command.
Using only those 10 lines, what should your output look like? Do you need this to work with
bedops
or can you use a combination of standard unix commands?