finding each pattern in a set of sequences
0
0
Entering edit mode
6.5 years ago
HK ▴ 40

Hey All,

I am trying to find a pattern in a set of sequences. What i am trying to do is that first take the 1st sequence of the pattern and match it with all subjects one by one and give an output (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) and so on i.e. in an iterative way. The input (pattern and subjects) are DNAstringSet instance (Biostrings).

I have used the function

mat=nucleotideSubstitutionMatrix(match=2,mismatch = -3,baseOnly = TRUE)
localAlign=pairwiseAlignment(pattern,subject,type="local",
                             substitutionMatrix=mat,
                             gapOpening=-5, gapExtension=-2)

But this way it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4

Example:

input:pattern

A DNAStringSet instance of length 734
      width seq                                                                         names               
  [1]  1000 GGTAAGAGTTTCTTAACAGATCTCAACATTTGCTATATAC...AGATTATTTGTCCTTTGAGATAAAATTACCAC P1 
  [2]  1000 TGTAAGTAATACTTAATGGTAATTTTTGTTTTCTCTTTCA...AGAAGCAAGGAGACCCGTTAGAGGAAGCATCC P2
  [3]  1000 GGTGAGTGTATGATTGATAACTAATCTCTTAGATTAACCA...CATGATATGAAATGGTTCCTAAAGATCCAGAC P3
  [4]  1000 GGTGAGCAAAATCAAGCAATGCATTGTTTGTTTTGGAGGG...CTATTTATGTACTACCTTTTTTTTTTAGAAAA P4

input: subject

A DNAStringSet instance of length 1000
       width seq                                                                        names               
   [1]  1000 GTAGGTACCTGGGAATTCACAAATTAAGACTTTTGAATA...TTCTTATTCAACCGTAGTAACATTAGATGAATA S1
   [2]  1000 GTGAGCGCTGCTGCCCAAGCCGCCTGGCTATGCTCGATT...AGATGGCCTTTTCTCTCAGCCCACTGTGACCTA S2
   [3]  1000 GTAAGTACAGGCTGAAAGTTACATGCTCTCCAAGGGTGA...ACATAGTAATGAATAGACTTTCAGACACAGCAT S3
   [4]  1000 GTAAGTTGCTTGTTTCTTAAATGTTAGGATCTATTACTT...AACAATATAGGTAAGTCTAGCCCTCAAGGCGCT S4
pattern subject R programming loop • 2.0k views
ADD COMMENT
0
Entering edit mode

I don't know if it's possible without a loop.

From the documentation : https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/pairwiseAlignment

subject : a character vector of length 1, an XString, or an XStringSet object of length 1.

Yours is 1000 :

A DNAStringSet instance of length 1000

ADD REPLY
0
Entering edit mode

@Bastien Herve Thanks for the reply, i have already used this function but as mentioned it actually matches p1 vs s1, p2 vs s2, p3 vs s3 nd p4 vs s4. whereas i need something different i.e (p1 vs s1,p1 vs s2, p1 vs s3, p1 vs s4), then take the 2nd pattern and match with all subjects (p2 vs s1, p2 vs s2, p2 vs s3,p2 vs s4) ans so on.

So i need help with the iteration (and i think this can only be done with the loop).

ADD REPLY
0
Entering edit mode

I think you can do the inverse process by looping over your subject :

localAlign=pairwiseAlignment(pattern,one_subject,type="local", substitutionMatrix=mat, gapOpening=-5,

You will get at first step : p1 vs s1, p2 vs s1...

You will get at second step : p1 vs s2, p2 vs s2...

For each step save your localAlign

At the end process what you saved

It's hard to reproduce without data so i didn't try it. At least it's working in my head :)

ADD REPLY
0
Entering edit mode

Well i knw i have to use the loop on the localAlignment function, but as i an newbie i R so have some problems in using loop or any alterantive way. The logic in mind is using nested loops

for(i in 1:length(pattern){
   for(j in 1:length(subjects){
        loaclAlignmentfunction }} # i.e first it take the p1 and come in the second loop and iterate in the subject loop till end, then goes to outer loop and picks p2 and comes in the seconf loop and iterates till the last p4 and so on.
ADD REPLY
0
Entering edit mode
localAlign_all <- c()
for (i in 1:length(subjects)) { 
    localAlign=pairwiseAlignment(pattern,subjects[i],type="local", substitutionMatrix=mat, gapOpening=-5,
    localAlign_all[i] <- localAlign
}

According to the description, you can use multiple patterns but only one subject

ADD REPLY

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6