Looping PairwiseAlignemnt {biostrings} for 25 proteins
2
1
Entering edit mode
6.1 years ago
tiriyon • 0

Hey I'm new here, and pretty new to R as well. I am trying to perform a PairwiseAlignment on 25 protein sequences from Uniprot. I have 25 Fasta files in this format:

>tr|A0A287AI92|A0A287AI92_PIG Carbonic anhydrase 1 OS=Sus scrofa OX=9823 GN=CA1 PE=1 SV=1MTSPAWGYDGEYGPEHWSKVYPIANGNNQSPIDIKTSETKHDTSLKPISV.....

I loaded the Fasta files to R using :

ProtSeq_1 <- read.fasta("C:/Users/tiriy/Documents/A0A287AI92.fasta")

Now I'm trying to order the Fasta files into data frame - which I miserably failed at doing and looping the Pairwise alignment with for loop like so:

comb<- combn(10,2)
MT1 <- a_data_frame
    for (i in 1:ncol(comb)) {
      x<-pairwiseAlignment( toString( MT1 [comb [1,i],1]),
     toString(MT1 [comb[2,i],1]),
    substitutionMatrix = "BLOSUM100", 
    gapOpening = -2, 
    gapExtension = -8, 
    scoreOnly = FALSE)
      fileA<-paste0("C:/Users/uri/Desktop/", i,"-","blusom",".txt")
      writePairwiseAlignments(x,file=fileA)}
  • How can I data frame all the sequences?
  • Or otherwise any other suggestion on looping the pairwise alignment on these fasta files?

Many thanks in advance! I hope I'd be able to give my aid back in the future

R sequence alignment • 2.4k views
ADD COMMENT
1
Entering edit mode
6.1 years ago
tiriyon • 0

So since I was able to solve all my problems alone, I thought it would be helpful if I posted my script here for future reference. (Note the final product is a data frame for all combinations of sequences pairwise alignments with sequence descriptions for both subject and pattern and their scores)

# Installing and loading packages pairwise alignment blosum50 and blosum100 ----

# Installing seqinR R package:
install.packages("seqinr")
# seqinR end

# Installing Bioconductor - Biostrings
chooseCRANmirror() # Getting inside CRAN mirrors
1 
# Choosing the first option
install.packages("BiocManager") # Installing BiocManager in order to install Biostrings
BiocManager::install("Biostrings") # Installing Biostrings via BiocManager
# End of Biostrings installation: Biostrings, BiocManager

# seqinr:
require(seqinr)
# seqinr end

# Biostrings:
require(Biostrings) # Loading Biostrings into library
#Biostrings end


# Step 1 - Listing all fasta files:----
all_fasta<-list.files("C:/Users/tiriy/Desktop/TheProject/ProjectR/fastas") 
# Making a vector to reffer to all fasta files in certain folder
# End of step 1

# Step 2 - Making a data frame out of the files ----
# using biostrings package - readAAStringSet
# And converting it into a dataframe:
A <- readAAStringSet(all_fasta) #Reading all fasta files at once {biostrings}
A1 <- names(A) # Setting vector to hold all the names from the fasta 
# files retrievd in vector A
A2 <- paste(A) # Setting a vector A2 with pasted values from vector A
ProtDF <- data.frame(A1,A2) # Using A1 as first olumn and A2 as second column
# End of step 2 

# Step 3 making a generation of combinations for pairwise alignment ----
D <- combn(25,2)
# End of easy peasy step 3

# Step 4 loading data:---- 
# 
# Loading Data:
data("BLOSUM50")
data("BLOSUM100")
# End of step 4

# Step 5 - assigning empty vectors for a for loop----
ScoreX <- c()
nameSeq1 <- c()
nameSeq2 <- c()
# End of step 5

# Step 6 THE LOOP ----
for (i in 1:300) { 
    dumpme<-pairwiseAlignment(toString(ProtDF[D[1,i],2]) # combining all pairwise alignment possible
                              ,toString(ProtDF[D[2,i],2]), # for 25 sequences = 300 combinations
                              substitutionMatrix = "BLOSUM50", # pairwise alignment settings
                              gapOpening = -2,
                              gapExtension = -8,
                              scoreOnly=FALSE)
   # filling vectors
  ScoreX[i]<-c(dumpme@score)   #score
  nameSeq1[i]<-c(as.character(ProtDF[D[1,i],1]))
  nameSeq2[i]<-c(as.character(ProtDF[D[2,i],1]))

}
#End of step 6 

# Step 7 creating a dataframe ----
BLOSUM50DF<- data.frame(nameSeq1,nameSeq2,ScoreX ) # DF consists of columns sequence 1 and 2 names as writen in fasta
colnames(BLOSUM50DF)<-c("Sequence 1","Sequence 2","Score") # score achieved in pairwise alignment calculation
write.csv(BLOSUM50DF,file="C:/Users/tiriy/Desktop/TheProject/BLOSUM50DF.csv") # writing to file (csv)
# End of step 7
ADD COMMENT
0
Entering edit mode
6.1 years ago
tiriyon • 0

I mannaged to solve the issue with multiple fasta files and not being able to sort them into data frame, here is how I solved it:

all_fasta<-list.files("C:/Users/tiriy/Desktop/TheProject/ProjectR/fastas") 
# Making a vector to reffer to all fasta files in certain folder

A <- readAAStringSet(all_fasta) #Reading all fasta files at once {biostrings}
A1 <- names(A) # Setting vector to hold all the names from the fasta 
# files retrievd in vector A
A2 <- paste(A) # Setting a vector A2 with pasted values from vector A
ProtDF <- data.frame(A1,A2) # Using A1 as first olumn and A2 as second column

{Thanks to Onyambu over at stackoverflow} Now I got a data frame out of the fasta files, though still can't loop the pairwise alignment, here's what I got so far:

    # Step 3 making a generation of combinations for pairwise alignment 
D <- combn(25,2)
# End of  step 3

# Step 4 a for loop pairwise alignment for BLOSUM50
# Loading Data:
data("BLOSUM50")
nrow(ProtDF)
for (i in 1:nrow(ProtDF)) {
  LeVariableDumpable<- 
  PairwiseAlignments(toString(ProtDF[D[1,i],2])
                      ,toString(ProtDF[D[1,i],2]),
                              substitutionMatrix = "BLOSUM50",
                              gapOpening = -2,
                              gapExtension = -8,)
  filenameNloc<-paste0("C:/50bussums/",i,"-","BUSSUm",".txt")
  writePairwiseAlignments(LeVariableDumpable,
                          file=filenameNloc)
}

Edit: Now the function writes the files, but what I tryed to do with combn didn't work. how can I correct this function so that it will cycle each sequences pair possible? BTW, the data frame consists of column 1: seq numbers, column 2 : sequence letters. 25 rows.

ADD COMMENT

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6