Hi everybody, I have different samples that I used to calculate the coverage over specific data set. I did it using deeptools, so my final output is something like this
s1 1,923575916 1,92340092 1,918008392 1,915392666 ...
s2 1,993446863 1,931309701 1,925363124 1,935658019 ...
s3 2,03417052 2,042601134 2,029136552 1,996391637 ...
s4 2,107394697 2,107865284 2,093484711 2,070557165 ...
I would like to test if there are differences on the coverage (by p-value). my first thought was to apply a ks-test, but then I decide to reject it as it test if two samples have the same statistical distribution and not if there are different or not. Does any one else try to address this question before? Thanks!
What kind of data is this? How did you produce the counts and what is the experimental setup? Are there replicates?
Hi, there are no replicates, they are ChiP-seq samples in different conditions.
Echoing ATpoint for a second, you are trying to test for differences between singletons, which isn't possible base-by-base between singletons. You require replicates measurements of coverage in order to do a KS-test or something similar. A p-value isn't a very strong statistic, since 'coverage' values (based on counts) aren't normally distributed. That said... what you want to look for is 'peak-finding' algortihms (like MACCS) more so than you want to look for adjusted p-values at specific bases. Even then, it is highly recommended to have biological replicates of each condition under study unless you already have a database of coverage values for this transcription factor to draw a distribution from to calculate your p-values. Big companies get away with singleton replicates in screening studies because they have those databases to draw from.