statistics data, observed vs expected values
0
0
Entering edit mode
6.5 years ago

I have a data.frame () with the values of the observed and expected frequency of a kmer in multiples genomes, and I would like to obtain a threshold value to classify the genomes according to their observed and expected values. I have been trying with Chi-square test and G-test, but I'm not sure these tests are the right ones.

I have also tried to plot log(obsved-expected)^2/expected as a function of the log (observed / expected)

Could you recommend me some statistical test to perform this task?

table

R • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you post an excerpt of your data.frame? Or all of it if not too big, possibly anonymizing the confidential information in it.

ADD REPLY
0
Entering edit mode

this is a sample of the data, I have approximately 3500 genomes

https://i0000.clarodrive.com/s/oxfB4puIAmmrCrf

ADD REPLY
0
Entering edit mode

What do you mean by "to classify the genomes according to their observed and expected values" ? The Chi squared test evaluates the difference between observed and expected frequencies under the model that generated the expected frequencies so it is the correct test to use if this is what you intend to test.

ADD REPLY

Login before adding your answer.

Traffic: 1784 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6