In R Studio: I have a chromosome sequence loaded (chr21 of hg19) and I have computed the frequencies of each nucleotide and N:
chr21 <- readDNAStringSet ('chr21.fa')[[1]]
freq <- letterFrequency (chr21, c('A', 'C', 'G', 'T', 'N'), as.prob = T)
I have written a function to compute the probability of a word (in DNAString type) given the probabilities of the nucleotides (Bernouilli model):
Bernoulli <- function (word, freq) {
P <- 1
for (i in 1:length (word))
P <- P * freq[as.character(word[i])]
as.numeric (P)
}
I have computed the occurences and frequency of every 7-nucleotides words in chr21:
sevenmer_occ = oligonucleotideFrequency (chr21, 7)
sevenmer_freq = oligonucleotideFrequency (chr21, 7, as.prob = T)
Now I am asked to compute "the p-value of each 7-nucleotides words (oligomers) given the Bernouilli model".
I cannot find how to do that... I think I should use the function pbinom for this, but I don't think I understand it well, as it is mostly giving me 0 or 1.
Can you help?
The question becomes what the p-value is meant to compare. The naive computation you show might or might not be meaningful for answering your actual biological question. So describe the question you're actually trying to answer and we can help you further.
Sorry, I first uploaded my question by mistake, I did not know people could read it while I was editing. Should be much clearer now!