Question

Perl : String comparison problem

1

Entering edit mode

9.3 years ago

bioinfo14 ▴ 10

Hi all,

I'm having sequences stored in two arrays @seq1 and @seq2 and I have to compare sequence @seq2 with all the indexes in the @seq1 and print the mismatched bases in each position.

These are my sequences:

@seq1 = "UAUGUACCGACCUUAUUCUCCU AUGUACCGACCUUAUUCUCCUG UGUACCGACCUUAUUCUCCUGU GUACCGACCUUAUUCUCCUGUG UACCGACCUUAUUCUCCUGUGA ACCGACCUUAUUCUCCUGUGAU CCGACCUUAUUCUCCUGUGAUC CGACCUUAUUCUCCUGUGAUCU GACCUUAUUCUCCUGUGAUCUA ACCUUAUUCUCCUGUGAUCUAC CCUUAUUCUCCUGUGAUCUACU CUUAUUCUCCUGUGAUCUACUA UUAUUCUCCUGUGAUCUACUAU UAUUCUCCUGUGAUCUACUAUA "

@seq2 = " UGGAGUGUGACAAUGGUGUUUG"

and this is my code:

foreach (0..length(@seq1))
{
  my $char = substr($seq2,$_,1);
  if($char ne substr($seq1, @_,1))
  {
    $result .="$char";
  }
  else
  {
    $result .="";
  }
}
print $result, "\n";

I'm getting errors. I would be grateful if anyone help me to complete this and to rectify my error. Thank you

perl • 3.9k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by bioinfo14 ▴ 10

1

Entering edit mode

Why is there a @_ in if($char ne substr($seq1, @_,1)).And also, storing a single sequence string in an array (@seq1) does not make sense. Your code has a couple of problems

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by thackl ★ 3.0k

0

Entering edit mode

It's not clear, but it's actually a white space quoted list

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Daniel ★ 4.0k

0

Entering edit mode

Yeah, I figured as much, but it took me some time ;). This also means that @landesfeind's solutions won't work...

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by thackl ★ 3.0k

2

Entering edit mode

Sorry, I didn't got that, in particular because there is also a white space when initializing @seq2 and because both @seq1 and @seq2 are later accessed using $ and substr(). I assumed the white spaces to be mistakes or formatting errors.

As I was bitching around, I would gladly adjust my code in the answer to match the desired output - if one can specify how it should look like. Probably more like the following?

my @sequences = qw/UAUGUACCGACCUUAUUCUCCU
                   AUGUACCGACCUUAUUCUCCUG
                   UGUACCGACCUUAUUCUCCUGU
                   GUACCGACCUUAUUCUCCUGUG
                   UACCGACCUUAUUCUCCUGUGA
                   ACCGACCUUAUUCUCCUGUGAU
                   CCGACCUUAUUCUCCUGUGAUC
                   CGACCUUAUUCUCCUGUGAUCU
                   GACCUUAUUCUCCUGUGAUCUA
                   ACCUUAUUCUCCUGUGAUCUAC
                   CCUUAUUCUCCUGUGAUCUACU
                   CUUAUUCUCCUGUGAUCUACUA
                   UUAUUCUCCUGUGAUCUACUAU
                   UAUUCUCCUGUGAUCUACUAUA/;
my $sequence  =   'UGGAGUGUGACAAUGGUGUUUG';

foreach my $s (@sequences){
   my $result = '';
   foreach (0 .. (length($s) - 1)){
      if(substr($sequence, $_, 1) ne substr($s, $_, 1)){
         $result .= substr($sequence, $_, 1);
      }
      else {
         $result .= " ";
      }
   }
   print $result, "\n";
}

which prints

 GGAGUGU   AA GG G UUG
UG AGUGUGA AAUGGUGUU  
  G GU UGACA  GG GU UG
UGGAGUGUGAC A GGUG U  
 GGA UGUGACAAUGGUGU UG
UGGAGUG G CAA GG   UUG
UG  GUG GACAAUGGUGUU G
U GAG GUGA AAUG   U UG
UGGAG G GACAA  GUG U G
UGGAGUG GA AAUG UGU UG
UGGAG GUGA AA  G G UUG
UGG G G GACAAUGGUGUU G
 GGAGUGUGACAAUGG GU UG
 GGAG GUGACA  GGUG U G

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Manuel Landesfeind ★ 1.4k

0

Entering edit mode

thanks a lot @landesfeind

ADD REPLY • link 9.3 years ago by bioinfo14 ▴ 10

Ram · Answer 1 · 2015-07-28

I'm fully aware that the code below does not quality as an appropriate teaching example - I'm simply a fan of a little bit of bit magic when it comes to string comparison.

#!/usr/bin/env perl
use warnings;
use strict;

my $q = "UGGAGUGUGACAAUGGUGUUUG";
my @r = qw(
    UAUGUACCGACCUUAUUCUCCU
    AUGUACCGACCUUAUUCUCCUG
    UGUACCGACCUUAUUCUCCUGU
);

foreach my $r (@r) {
    my $xb = $r ^ $q; # compare
    $xb =~ tr/\0\377/\0\377/c; # bitmask with mm=1, m=0
    my $xs = $r & $xb; # mismatches
    $xs =~ tr/\0/ /; # add gaps

    print "ref: ", $r,"\n";
    print "qry: ", $q,"\n";
    print "mis: ", $xs,"\n\n";
}

produces:

ref: UAUGUACCGACCUUAUUCUCCU
qry: UGGAGUGUGACAAUGGUGUUUG
mis:  AUGUACC   CU AU C CCU

ref: AUGUACCGACCUUAUUCUCCUG
qry: UGGAGUGUGACAAUGGUGUUUG
mis: AU UACCGAC UUAUUCUCC  

ref: UGUACCGACCUUAUUCUCCUGU
qry: UGGAGUGUGACAAUGGUGUUUG
mis:   U CC ACCUU  UC CC GU

Ram · Answer 2 · 2015-07-28

1

Entering edit mode

9.3 years ago

Manuel Landesfeind ★ 1.4k

[EDIT] Not a working solution - see comments above [/EDIT]

The following code snippet runs and gives you the desired result even though your approach is highly inefficient.

However, as @thackl mentioned, you made some ~~horrible~~ severe mistakes in your code! I do not want to be rude, but: read an introduction to Perl and learn some basics in programming before you proceed! Without that, your code will continue to be messy, fail, and you will waste time - yours and ours. Sorry.

#!/usr/bin/perl
use strict;
use warnings;

# Split the string sequence into an array
my @seq1 = split('', "UAUGUACCGACCUUAUUCUCCUAUGUACCGACCUUAUUCUCCUGUGU..."); 
my @seq2 = split('', "UGGAGUGUGACAAUGGUGUUUG");

# Determine the length of the shorter sequence
my $length;
if( scalar(@seq1) < scalar(@seq2) ){
   $length = scalar(@seq1);
}
else {
   $length = scalar(@seq2);
}

# Iterate the arrays and store results
my $result = '';
foreach (0 .. ($length - 1)){
   if($seq1[$_] ne $seq2[$_]){
      $result .= $seq2[$_];
   }
   else {
      $result .= " ";
   }
}
print $result, "\n";

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Manuel Landesfeind ★ 1.4k

1

Entering edit mode

I would recommend Modern Perl, O'Reilly's Learning Perl, or Beginning Perl by Curtis Poe for learning Perl. It is better to get a good introduction from a solid book than try to learn from a 15 year old web article, which is likely to lead to more frustration and problems (it teaches the wrong way to do things right from the beginning).

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by SES 8.6k

0

Entering edit mode

Although I support your advice on reading Perl intros, I wouldn't call the above mistakes "horrible". To me this looks like a classic "hands-on" Perl learning attempt. However, given the technical nature of the posted question and the fact that sequences in question need not to be RNA/DNA/AA for the algorithm to be applied, I believe that the OP should try to post the question on a more suited forum like PerlMonks, where I believe she/he will get a more of an "educative" advice on how to tackle the problem and what would be a more efficient solution, especially since the OP has shown the attempt to resolve it.

mxs

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by mxs ▴ 530

0

Entering edit mode

Probably, "horrible" was too rude then - I changed it. But I hold up the demand for learning PERL basics. Coming from a computer science background, I think it is crucial to understand the very basics of the programming language in use, e.g., not to mix strings with arrays (at least in PERL), check for array lengths, etc.

+1 for asking for help in the PERL programming community (e.g. PerlMonks) - even though I think they will be even more picky about the coding style ;-)

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Manuel Landesfeind ★ 1.4k

0

Entering edit mode

@landesfeind Thank you for your reply. I'm new to perl and I'm on the process of learning so I have just tried using codes already given in some other websites for my problem but it didn't work.. that's why I posted my query here. This is also a process of learning. Learning from experienced person will help us more than books. And your script gives solution for the first index alone (i.e., first 22characters) I need to compare @seq2 from position 1 to n in @seq1 (i.e., from 1 to 22 then from 2 to 23 then from 3 to 24 and so on..) 1, 2 3... are characters in @seq1. Thank you.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by bioinfo14 ▴ 10