Entering edit mode
8.8 years ago
scchess
▴
640
Let's say I measure the overall average coverage per base for my genome be 30x. Now, I have a region spanning 100 bases. This region might be a region where I'm interested to compare with the genome. Let's say my average coverage per base in this base is 40x. I want to ask a question: "is my region statistically different to the genome in terms of coverage per base?"
How should I approach this problem? Can I do a t-test?
Do you have multiple samples? Otherwise, you are only comparing two numbers, which is difficult to perform any form of statistic on.
I have the genome coverage like { 56, 67, 89 ... } then I have coverage per base in my region like {67, 89, 90 ... }. The size of the genome region and my region can be different. My question is, whether the average coverage in my region is different to the genome coverage.
Thus, I do have standard deviation and all the data.
From the number you've provided, it seems like the coverage is the average coverage of the region? If that is the case, maybe you can indeed try to do a t-test on it? However, it is always safe to plot the distribution of your data first. If they are not normally distributed, then you might need to use something else.