Hi I have a bedpe file(describing loop across genome, distance may be quite long) and a bed file.
The data look like this:
Bedpe(First 6 columns describe the loop which are chromosome1 start1 end1 chromosome2 start2 end2, while the rest of the columns are some attribution that is useful):
chr1 1050000 1060000 chr1 1180000 1190000 0,255,255 241 107.673 11 8.802 143.514 120.144 1.09607073802e-16 9.5834568345e-17 2.5647576134 8e-07 1.6487531336e-16 2 1060000 1180000 7071.06781187
Bed :
chr1 10000 10271 CTCF 1000 . 10000 10271 10,190,254
I want to find the overlap between anchor region in bedpe file and bed file. How can I use bedtools to do this job?
BTW, is there a way to properly sort the bedpe file? I tried to sort using the command "sort -k1,1 -k2,2n infile" that is recommended by the bedtools. Is it suitable for bedpe file? Or should I use "sort -k1,1 -k2,2 -k3,3 -k4,4 -k5,5 -k6,6 infile"?
yep I got similar results like this. I used this command:
bedtools intersect -wa -wb -a bedpe -b bed -sorted
But it seems that the overlap is between first 3 column in -a file and -b file. Is there any way that can also find out the overlap between column 4-6 in -a file and first 3 column in -b file at the same time?
bed file is always concerned with overlapping of first 3 columns in the tab delimited file,
chr#, start
andend
co-ordinates. The rest you see in output are just data entries of corresponding input files that you want to see as output using different handles like-wa . -wb - wao
. If you want to work on other columns of a bed file then you simply have to reconstruct new bed file with your desired columns and then use them for your downstream operations.Break up your BEDPE file using
cut
. I would also annotate each line in the BEDPE so you can match the two positions in the BEDPE file. If your BEDPE file ischr1 100 200 chr1 500 600 ...
I would break it up likechr1 100 200 POS1
and the other filechr1 500 600 POS1
Then run
intersectBed
on each BED file you generated