Entering edit mode
6.5 years ago
ulises.rodriguez
•
0
I have a data.frame () with the values of the observed and expected frequency of a kmer in multiples genomes, and I would like to obtain a threshold value to classify the genomes according to their observed and expected values. I have been trying with Chi-square test and G-test, but I'm not sure these tests are the right ones.
I have also tried to plot log(obsved-expected)^2/expected as a function of the log (observed / expected)
Could you recommend me some statistical test to perform this task?
Can you post an excerpt of your data.frame? Or all of it if not too big, possibly anonymizing the confidential information in it.
this is a sample of the data, I have approximately 3500 genomes
https://i0000.clarodrive.com/s/oxfB4puIAmmrCrf
What do you mean by "to classify the genomes according to their observed and expected values" ? The Chi squared test evaluates the difference between observed and expected frequencies under the model that generated the expected frequencies so it is the correct test to use if this is what you intend to test.