Calculating Genetic Distances Between Protein Sequences
1
1
Entering edit mode
13.3 years ago
Sanju ▴ 90

Hi all,

How to calculate the pairwise genetic distances between protein sequences in R? I have already calculated pairwise sequence identity and stored in excel file. I imported this excel file in to R. But I couldn't generate a distance matrix based on this file. I used dist.alignment function but I got an error like this

"Object of class 'alignment' expected" ? Please help me to solve this problem

r • 13k views
ADD COMMENT
1
Entering edit mode

Try editing your question to include the code you are using, example input, and also the output of sessionInfo(). Also, be sure to read the help for the dist.alignment function.

ADD REPLY
3
Entering edit mode
13.3 years ago
Yogesh Pandit ▴ 520

Get your sequence file in either of the following formats

mase, clustal, msf, phylip, fasta

Then you can read this file into an Object of class alignment

library(seqinr)
myseqs <- read.alignment("mySeq.fasta", format = "fasta")
mat <- dist.alignment(myseqs, matrix = "identity")

dist.alignment() will calculate "pairwise Distances from Aligned Protein or DNA/RNA Sequences". The output matrix (mat) will look like

       Langur    Baboon     Human       Rat       Cow
Baboon 0.3307189                                        
Human  0.3750000 0.3307189                              
Rat    0.5448624 0.5077524 0.5376453                    
Cow    0.4921255 0.5448624 0.5590170 0.6495191          
Horse  0.7071068 0.7071068 0.7015608 0.7015608 0.7342088
ADD COMMENT
0
Entering edit mode

just curious, are these distances likelihood-based estimates from certain protein evolution models?

ADD REPLY
0
Entering edit mode

@y2p actually my file is not an aligned file. It is a excel file. This file contains sequence identity data.for eg

    1       2       3       4       5       6       7
1                         
2   11.1                      
3   11.1    100.0                 
4   11.1    100.0   100.0             
5   18.2    11.1    11.1    11.1          
6   21.7    11.1    11.1    11.1    100.0     
7   22.2    11.1    11.1    11.1    44.4    44.4  
8   80.0    20.0    20.0    20.0    80.0    80.0    100.0

How to generate distance matrix from this excel file based on sequence identity data? Which function I have to use for this?

ADD REPLY
0
Entering edit mode

@y2p. Actually my file is not an aligned file. It is an excel file which contains the percentage of sequence identity.My aim is to generate a distance matrix based on this data. That is distance= 100-sequence identity. which function I have to use for this?Do you have any idea?

ADD REPLY
0
Entering edit mode

@y2p. I have 300 sheets in excel file. Sequence identity data is in 9th column. I imported all sequence identity values from 300 sheets in to R using this code.

library(gdata)
myfile<-NULL; 
for (i in 1:300) {
    myfile[[i]]<-read.xls("C://Users//Desktop//mydata.xls",sheet=i,head=F)[,9]
}
myfile

Next I have to apply distance formula and to create a matrix. The formula is distance = 100 - sequenceidentity.

Please help me

ADD REPLY
0
Entering edit mode

@vitis these are just squared root pairwise distances from similarity/identity matrices

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6