Question

How to determine the average distance between known motif in a list of DNA sequences

1

Entering edit mode

8.7 years ago

kevinm ▴ 40

Hi everybody !

So there is my problem : I am searching for the average distance between a known motif inside sequence, and extend this to a list of sequences... The first part is done, the second part (extend to a list of sequences) is the problematic one ! So, here the way i am doing the first part :

source("motifOccurrence.R") #https://www.r-bloggers.com/calculate-the-average-distance-between-a-given-dna-motif-within-dna-sequences-in-r/

library("seqinr")

df <- readDNAStringSet("X.fasta")

df2 <- df[[1]]

motif <- c("T", "C", "C", "A")

coord <- coordMotif(df2, motif)

motidist <- computeDistance(coord)

motidist

[1] 152

It's appear that the first sequence of my fasta list have an average distance of 152 nucelotide between two TCCA motif. And, i don't know how automatize this to all my list in df...

Thanks by advance for the help.

Kévin

R sequence Motif Distance • 2.9k views

ADD COMMENT • link updated 8.6 years ago by Macherki M E ▴ 120 • written 8.7 years ago by kevinm ▴ 40

0

Entering edit mode

Look at apply() and its variants.

ADD REPLY • link 8.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks, i will check it !

ADD REPLY • link 8.7 years ago by kevinm ▴ 40

score 2 · Accepted Answer · 2016-10-11

The Rho function in the seqinr package calculate the indipendence of occurence of a k-words using the expected word count. We can modify it to be run just for our target:

library("seqinr")
data(ec999)

exp_count<-function (sequence, wordsize = 2, alphabet = s2c("acgt")) 
{
    wordcount <- count(sequence, wordsize, freq = FALSE, alphabet = alphabet)
    uni <- count(sequence, 1, freq = TRUE, alphabet = alphabet)
    expected_wordfreq <- function(wordsize, uni) {
        if (wordsize == 1) 
            return(uni)
        else kronecker(uni, expected_wordfreq(wordsize - 1, uni))
    }
    expected_wordcount <- sum(wordcount) * expected_wordfreq(wordsize, 
        uni)

names(expected_wordcount)<-words(wordsize,alphabet = alphabet)
return(expected_wordcount)
}
#exp_count(ec999[[1]],3)