Entering edit mode
6.8 years ago
jomo018
▴
730
I am using bedtools fisher as follows:
bedtools fisher -a $fA -b $fB -g $fG > $fOut
Where fA consists of wide peaks and fB narrower peaks. The results are:
Number of query intervals: 741
Number of db intervals: 18079
Number of overlaps: 1481
Number of possible intervals (estimated): 146318
phyper(1481 - 1, 741, 146318 - 741, 18079, lower.tail=F)
Contingency Table Of Counts
_________________________________________
| in -b | not in -b |
in -a | 1481 | 0 |
not in -a | 16598 | 128239 |
_________________________________________
p-values for fisher's exact test
left right two-tail ratio
1 0 0 inf
The cell in -a in -b (1481)
is fine because several of fB peaks may fall within one fA peak. However, there are about 150 fA peaks which do not have any overlap with fB (as found using bedtools intersect
(counting unique ID only) and also verified in a browser).
So what is the meaning of cell in -a not in -b
showing zero?
Just to make absolutely sure, you should sort your BED files prior to running this command. BEDTools assumes that they are sorted:
upload image
Also be aware of the default parameters regarding overlap:
Thank you Kevin. The files are all sorted as required. I have posted the command so you can see I am just using default values. The problem is that any constraints on -f, -r, -s and -S will further reduce overlaps and increase non-overlapping peaks. But I already have 150 non-overlapping peaks (appearing in a but not in b) which are not reported in the top-right cell of the table report.
Can you paste some of these example regions (from the #150 non-overlappers) from the input BED files? Also which genome are you using as
$fG
?Genome is hg38.
Here is an example of non-overlappers:
From File a:
From file b:
And another example: From file a:
From file b:
Weird... and there's nothing odd about your genome file such that these particular intervals could be entirely overlooked?
The genome file is just a list of CHR sizes. Here is the section containing CHR1 through CHR3 which is relevant for the examples above:
Looks like it should be reported as a bug at the Google Groupsage: https://groups.google.com/forum/#!forum/bedtools-discuss
There may be some internal filter that's not reported in the manual. It should be calling those as unique to A, though.
Just to clarify,
bedtools intersect
does report correctly only overlapping peaks and ignores non-overlapping peaks like the ones above.Thanks - that's encouraging at least.
5 years later, I have the same issue.... with several files. This means that these pvalues aren't accurate. What other tool could we use to test the statistics ? Thanks
Please do not add answers unless you're answering the top level question. Use
Add Comment
orAdd Reply
as appropriate. I've moved your post to a comment this time.