bedtools sort and merge
3
0
Entering edit mode
4.0 years ago
hosin • 0

Hello there

I want to find common pieces (regions) between the below coordinates in two big files?

file1:                                      file2:
chr1:251790423-251855075    chr1:251391746-251411804
chr1:259520908-259580523    chr1:259687605-259759271
chr1:261294390-261396569    chr1:259815659-259854201
chr2:108327854-108382699    chr2:108327854-108388888
chr20:28151226-28420685     chr20:28141234-28520687
chr3:15673814-15987811      chr3:15673814-15997815
chr10:70552773-71399757     chr10:70782782-71499757

I have tried:

cat file1 file2 | sed "s/[-:]/\t/g" | bedtools sort | bedtools merge > result

But it can only merge them. I would be thankful if I could have your kind suggestions regarding to find only common regions (without extra lengths) between two files.

genome • 2.1k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

If you use the bash shell, you can use BEDOPS bedops with process substitutions to create an efficient one-line solution:

$ bedops --intersect <(sed "s/[-:]/\t/g" file1 | sort-bed -) <(sed "s/[-:]/\t/g" file2 | sort-bed -) > answer.bed

The result will be in answer.bed.

If you use zsh (as some on Mac OS do), then the syntax for process substitutions is slightly different, but it's the same idea. (If you use Mac OS, though, your sed command would probably be a bit different.)

Bonus BEDOPS: One advantage is the ability to specify arbitrary numbers of processes, if you have more than two files:

$ bedops --intersect <(sed "s/[-:]/\t/g" file1 | sort-bed -) <(sed "s/[-:]/\t/g" file2 | sort-bed -) ... <(sed "s/[-:]/\t/g" fileN | sort-bed -) > answer.bed

You can specify as many as you like, up to your operating system's file handle limit (usually 1021, but that can be adjusted).

ADD COMMENT
0
Entering edit mode

Thanks to send me this information. I have installed bedops as a "bin" file and run the command exactly in the bin directory and also another directory, but still I have these errors:

bedops --intersect <(sed "s/[-:]/\t/g" ourstudy | sort-bed -) <(sed "s/[-:]/\t/g" mastudy | sort-bed -) > answer.bed
bash: sort-bed: command not found...
bash: sort-bed: command not found...
bash: bedops: command not found...
  

Do you know what is the problem?

ADD REPLY
0
Entering edit mode

You either need to copy binaries to /usr/local/bin or add the directory containing binaries to your PATH environment variable.

See: https://bedops.readthedocs.io/en/latest/content/installation.html#linux

ADD REPLY
0
Entering edit mode
4.0 years ago
h.mon 35k

Have a look at bedtools (or bedops) intersect. Your files are not bed files, you will need to massage them before proceeding.

edit: you are already massaging your files, I overlooked the sed "s/[-:]/\t/g" part.

ADD COMMENT
0
Entering edit mode
4.0 years ago
MatthewP ★ 1.4k

What bedtools merge command doing is merge overlap intervals, in your case, you will get union sets of bed intervals. You should use bedtools intersect to get only common regions, but you need to convert to bed format first.

ADD COMMENT

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6