I've got a new issue...
I would like to compute the average distance between two distinct motifs in the same sequences list than before. Have you some clue on how manage this ???
I just come to do it for one motif like that :
source("motifOccurrence.R")
motif <- c("T", "C", "A", "A")
motidist <- sapply(df, FUN=function(df, motif) {
computeDistance(coordMotif(df, motif))
}, motif = motif)
This R code give me the average distance between a define motif inside all the sequences of my list. And I would like to do the same but with two motif... Can someone help me ?
To extend informations about what i wanted to do :
i worked with a fasta file in input :
'> 1
GACTCTACTATAAACGGGAGATAGCAATCTAACGCAGTGCTTCAACTCCTCCTCCATCTGAACACCCTTCAACCTTTGATACTCAGACGTTTTAGGTCGG
'> 2
ACCACCCCTTTGTCCAGAAATAGGACTCTTGGGCCTGTTGCCTGAATAAAGTCCAACCACCACAACCACTACACTACCATATGTAAGCTTCACTGATGGT
'> 3
CACCACAAGTGCGCGCCACGACGTGCATAGCCTCTAGATCGGCAACTCAGGCGAGAAGTGTTTTATTTCGGTGTGGCCGGTCCTGGGCATTTTACGGAAA
'> 4
GTTAGTGTACAAGTCCGAATAGAGTCACGAAAGACCCACACAACCACGTAATGACCTCGCTGTAATGAGATCAGTTGGCTCATGAAGGAAGAACGTAATG
'> 5
TGAGCGTTCGCCAATAACCATCCCTCTCGTTCCTTGTAACTGTACTATGATAGCGGGCGCCCCCCTAATTAAATAGCGGACGCCCTGACCTATTGTATGA
'> 6
TGATATATCTACTCGATAAGGATATAGAGGTCTAATTGTTGAGAAGTGTACCACCTTAGAGCACGAGTTTAGGATACTTAGTAGGTTCTTGCGAAGGATA
etcetc
Then the motidist object look like this :
1 2
152 94
3 4
36 138
5 6
92 113
And the distances given by the function stand for one motif, and now, i would like to do the same but for the disatnce between two motifs like this :
atcgacatagacgactgatcgtcag MOTIF1 acggtagacagt MOTIF2 agcagatgacta # And this for all sequences in the file !
Thanks by advance
Can't you just loop over the sequence, record the index of the motif if found and subtract that when you find the next one? Additional question: what should be done if you see:
1) atcgacatagacgactgatcgtcag MOTIF1 acggtagacagt MOTIF2 agcagatgacta MOTIF2 acgtcgtagctgatgctcggct (twice motif 2 after each other)
2) atcgacatagacgactgatcgtcag MOTIF1 acggtagacagt MOTIF2 agcagatgactaacgtgtgtgtg MOTIF1 acgtcgtagctgatgctcggct (motif2 sandwhiched by motif1 with different lengths)
3) atcgacatagacgactgatcgtcag MOTIF1 acggtagacagtagcagatgactaacgtcgtagctgatgctcggct (just a motif1 without motif2)
What would you mean by loop over the sequence ?
And for the additional questions :
1) I would like to know only the distance between the closer motifs motif 1 and first motif 2.
2) In this case the two informations are interesting and in a first time the average distance will be a sufficient info.
3) If there is no motif 2, return me 0.
You should have a look at the
re
python module, regular expression, and just get the index of the position(s) in which a motif is found. This should be pretty easy.Can you detailled your answer with a running example
I've edited my post with more details.
Do you need it in R or any other language would work?
R would be the easier way for me, but i can handle with python or perl if you have an idea in those languages !
What does
source("motifOccurrence.R")
does? What doesdf
looks like? A minimum working example would help.For source("motifOccurrence.R") ==> https://www.r-bloggers.com/calculate-the-average-distance-between-a-given-dna-motif-within-dna-sequences-in-r/
And df is just DNAStringSet instance looking like this :