Can we do a statistical test of a region against the genome?

0

Entering edit mode

8.7 years ago

scchess ▴ 640

Let's say I measure the overall average coverage per base for my genome be 30x. Now, I have a region spanning 100 bases. This region might be a region where I'm interested to compare with the genome. Let's say my average coverage per base in this base is 40x. I want to ask a question: "is my region statistically different to the genome in terms of coverage per base?"

How should I approach this problem? Can I do a t-test?

dna genome • 1.8k views

ADD COMMENT • link 8.7 years ago by scchess ▴ 640

0

Entering edit mode

Do you have multiple samples? Otherwise, you are only comparing two numbers, which is difficult to perform any form of statistic on.

ADD REPLY • link 8.7 years ago by Sam ★ 4.8k

0

Entering edit mode

I have the genome coverage like { 56, 67, 89 ... } then I have coverage per base in my region like {67, 89, 90 ... }. The size of the genome region and my region can be different. My question is, whether the average coverage in my region is different to the genome coverage.

ADD REPLY • link 8.7 years ago by scchess ▴ 640

0

Entering edit mode

Thus, I do have standard deviation and all the data.

ADD REPLY • link 8.7 years ago by scchess ▴ 640

0

Entering edit mode

From the number you've provided, it seems like the coverage is the average coverage of the region? If that is the case, maybe you can indeed try to do a t-test on it? However, it is always safe to plot the distribution of your data first. If they are not normally distributed, then you might need to use something else.

ADD REPLY • link 8.7 years ago by Sam ★ 4.8k

Login before adding your answer.