Overlapping peaks of both datasets.
First, if not sorted, make sure that your peak, tumour and normal BED files are sorted, e.g.:
$ sort-bed tumour01.unknown_sort_state.bed > tumour01.bed
Repeat sorting for the remaining peak, tumour and normal BED files, as needed. You only have to sort once, at the beginning.
Take the multiset union of your tumour BED files with bedops, and pipe that unioned set to a second bedops command, to find peaks that overlap all tumour elements:
$ bedops --everything tumour01.bed tumour02.bed ... tumour13.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_tumour_sets.bed
Or all normal elements:
$ bedops --everything normal01.bed normal02.bed ... normal07.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_normal_sets.bed
Or elements from both categories:
$ bedops --everything tumour01.bed tumour02.bed ... tumour13.bed normal01.bed normal02.bed ... normal07.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_tumour_and_normal_sets.bed
If you're trying to do something else, please clarify the kind of set operation or association that you want to do.
For example, do you need to know which tumour or normal element's subset overlaps with a particular peak? The bedmap tool can help you here, but you need to preprocess your tumor and normal element subsets, first. Feel free to follow up.
Overlaps of from unique ( n=1) to n= 13 for tumour or 7 for normal overlaps.
You can use a generalization of this approach for finding elements common to all N subsets. For example, for N=13, where A.bed
through N.bed
are your 13 tumour element sets:
$ N=13
$ bedops --everything A.bed B.bed C.bed ... N.bed \
| bedmap --count --echo --delim '\t' - \
| uniq \
| awk -vN=${N} '$1==N' \
| cut -f2- \
> common_to_all_N_tumour_subsets.bed
You can modify this approach for N-1 (12) subsets, N-2 (11) subsets, and so on, by modifying the awk test:
$ N=13
$ bedops --everything A.bed B.bed C.bed ... N.bed \
| bedmap --count --echo --delim '\t' - \
| uniq \
| awk -vN=${N} '$1==(N-1)' \
| cut -f2- \
> common_to_N_minus_1_tumour_subsets.bed
You would repeat this for N=7 for your seven normal set files.
Once you have files common_to_*.bed
that you need, you can use bedops or bedmap with each of them to do overlap or association tests with peaks, e.g.:
$ bedmap --echo --echo-map peaks.bed common_to_all_N_tumour_subsets.bed > common_tumour_elements_that_overlap_each_peak.bed
"2bp region of overlap which makes no sense" --> why does it make no sense? 1bp overlap is still an overlap if you do not set a minimum number of bp
Tool: bedtools multiinter (aka multiIntersectBed)
This is the help of multi inter. Now please tell me how to specify that? Thank you