Hello I have two files and each file has one column, for example:
File 1:600rows, Like:
chr1:100109327-100162088
chr1:103728175-103796914
chr2:103908343-104007674
chr5:104660686-104731626
chr11:216500833-216575408
chr12:218693122-218856909
chr13:220346086-220550182
chr17:21377148-21496334
chr17:21808582-21902643
chr18:67771953-67842433
chr19:49677672-49745965
chr25:2341943-2453481
chr26:25210373-26275666
File 2:400rows,Like:
chr1:100119327-101162999
chr1:103728175-103796914
chr2:103908343-104777674
chr5:104660686-104731626
chr11:216500833-216575408
chr12:218693122-218856909
chr13:220356089-220555555
chr17:21377148-21496334
chr17:21808582-21902643
chr18:67772354-67942544
chr19:49877683-49945922
chr25:2341943-2453481
chr26:55210373-56275666
I want to merging overlapping them,Of course some regions are same, so we just need to one of them, and some regions doesn't overlap, we need both of them. Please help me.Thanks
I would be thankful for your time , Your suggestion is Ok. If I have 3 columns in each file? Actually I want to merge based on first column(regions), for example:
File 1:
file 2:
result:
The full format of your data should have been described from the start.
The simplest solution I can think of is to 1) merge your 2nd and 3rd columns into a new one with perl, python or awk; 2) convert the stream to bed with sed (as above); 3) sortBed; 4)
mergeBed -c 4 -o collapse
. This way, you will have the same result as above, but with a 4th column consisting of all genes and animals. Now, use perl, python or awk again to unroll this 4th column into two columns.I am off my computer, so I can't provide code examples right now.
edit: I have updated my answer to cover your extended question. Honestly, you will be hard pressed to find more elegant code. I am amazed at myself for the combination of awk and perl in the same stream of commands.
Dear h.mon Hello I'm very grateful for your time. I've tried the last code but it does not work. It has these errors:
Try to replace
sortBed
(BEDtools) withsort-bed
(BEDOPS) orsort -k1,1 -k2,2n -k3,3n
(GNU coreutils).I just noticed my suggestion create bed files with spaces on the fourth column, which is not a good idea. It worked with the small example you provided, but may cause problems. It would be a good idea to replace those spaces with other character, then replace back for the final output.
What is the output of:
Hi, I did, But there is error, again :
The first run:replace sortBed with sort-bed
Second run: replace sortBed with sort -k1,1 -k2,2n -k3,3n
Third run: Output is empty
Sorry, I am out of ideas and I don't have access to your data to troubleshoot it. All I can say is it worked for the example you provided. It is possible (but unlikely) that there is a bug in BEDtools - I think bad data being fed into BEDtools is a more likely cause.