Entering edit mode
6.0 years ago
Ashley
▴
90
library("Biostrings")
I want to calculate the probability of di-nucleotide AA, TT, AT, and TA in each 2 location.
My DNA sequence is as follows:
DNA.set
A DNAStringSet instance of length 5
width seq names
[1] 20 TCCGTATTGGAAAGCTCGTC SEQ-1
[2] 20 TTAGACCACTCCGCATGTAG SEQ-2
[3] 20 CTGTGGTACGGCTCAAACGG SEQ-3
[4] 20 CTCCCGCCTATCTCCCTTCT SEQ-4
[5] 20 TCGCCTAGAAAAAGTTTCCT SEQ-5
I want to obtain the result as follows:
AA=0,0,0,0,1/5,2/5,0,1/5,0,0
TT=1/5,0,0,0,0,0,0,1/5,1/5,0
AT=0,0,0,0,0,0,0,0,1/5,0,0
TA=0,0,1/5,1/5,1/5,0,0,0,0,0
Any help would be great appreciate.
dinucleotideFrequency function of biostrings could give those 2mers. than you can take the subset of your desired ones.
Thanks for your reply. But I want to know the frequency or probability of A/T in each position. Not total number. So dinucleotideFrequency maybe isn't suitable for me.
consensusMatrix(dinucleotideFrequency(DNA.set)) ? maybe ?
For our example,
Thanks for your kind help. But I think the column of result should be the length(DNA.seq)/2=10, however, the column is 16. And it didn't show which column represents for AA, AT, TA and TT. I am the newcomer of bioinformatics, could you help me figure it out? Thank you so much. With my best wishes.
I think the number of column is always the 16. For another example,
(original link: https://www.dropbox.com/s/h8de0hcc8vc193t/data.jpg?dl=0 )