Question

BioPerl annotation question

0

Entering edit mode

10.2 years ago

tg ▴ 10

Dear Bioperl professionals am new to bioperl and i have managed to write this code from a beginners level. i want to add annotation value to my genbank file but the annotation is in a text file( tabular). e.g if the proteinid in the genbank matches the line of annotation in the text file it should append the value in the /notes of the CDS. please help me. Many thanks

#!/usr/bin/perl
use Bio::Seq;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::Perl;

open(INFILE, "testdata"); # this file is in tabular format its my annotation file
@annovalue= $line = <INFILE>;  # I tried opening the file in bioperl couldn't so I used perl

$myfile = 'testgenbank.gbk'; # this is my genbank file I want to add annotation value to
my $seqio_obj = Bio::SeqIO ->new (-file => $myfile,-format=>'genbank');


while($annovalue ne "" )
{
    for  $feat_object ($seq_obj->get_SeqFeatures)
    {
    if ($feat_object->primary_tag eq "CDS")
    {
        if ($feat_object->has_tag('protein_id'))
        {
        for my $val ($feat_object->get_tag_values('protein_id'))
        {
            foreach $val(@feat_object)
            {
            if ($val=~ /$annovalue/)
            {  # here am saying if the protein id in annotation file matches the tabular file
                $feat_object -> $set_tag_values('notes'); # add the annotation to the notes in the CDS of that gene
                $seq_obj = $seqio_obj -> $next_seq();
            }
            }
        }
        }
    }
    }
print OUTFILE ($line);
}

annotation bioperl • 3.1k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 10.2 years ago by tg ▴ 10

0

Entering edit mode

I tried to formatted your code for readability, but it looks with some missing brackets.

ADD REPLY • link 10.2 years ago by JC 13k

0

Entering edit mode

where please?

ADD REPLY • link 10.2 years ago by tg ▴ 10

0

Entering edit mode

In your main loop, you open 7 brackets, you close only 4.

ADD REPLY • link 10.2 years ago by JC 13k

0

Entering edit mode

i have added the brackets yet still am certain sometime is still wrong with my syntax. its not working still

ADD REPLY • link 10.2 years ago by tg ▴ 10

0

Entering edit mode

In addition to the original missing brackets, the indentation is not consistent. Sometimes you indent the line after an opening bracket, but sometimes you don't. These may seem like small things but they make a huge difference in someone's ability to understand what the code is doing.

ADD REPLY • link 10.2 years ago by Daniel Standage 4.1k

0

Entering edit mode

Hi Daniel I have rewritten the code. I want to print the id of the fasta file which in my case is protein id and match it to the tag value which is proteinid of cds feature. if it matches then I want to add the sequence in fasta(wch in my case is protein function to the /notes tag.

#!/usr/bin/perl

use strict;
use Bio::Seq;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;

my $genbankfile ='testgenbank.gbk';
my $seqio_obj = Bio::SeqIO ->new (-file => $genbankfile,-format=>'genbank');
$seq_obj = $seqio_obj->next_seq;

my $domainfile ="domain.fasta";
my $seqin = Bio::SeqIO ->new (-file => $domainfile,-format=>'fasta');


while(my $seq = $seqin->next_seq) {
    if($seq ne "") {
        for my $feat_obj ($seq_obj->get_SeqFeatures) {
            print "primary tag: ", $feat_obj->primary_tag, "\n";
            for my $tag ($feat_obj->get_all_tags) {
                print "  tag: ", $tag, "\n";
                for my $value ($feat_obj->get_tag_values($tag)) {
                    print "    value: ", $value, "\n";
                    my ($id) = $seq =~ /^>*(\S+)/;
                    while (my $id =~/$value/){
                        print $seq->id(),"\n";
                    }
                }
            }
        }
    }
}

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 10.2 years ago by tg ▴ 10

Ram · Answer 1 · 2014-10-09

Use strict

You should always use use strict; at the beginning of your perl scripts. The benefit of this is that errors get flagged and your program won't run unless you get things right. You can read up on this here.

Using 'strict' means, that you will have to define each variable using the work my to ensure that each variable name is unique. You will see examples of this in the edited code.

Open a file

The next thing I've noticed in your script is the way you open your file. Generally speaking, you should always include the mode you're using, i.e "<" for reading a file and ">" for writing to a file. Also, include the die command in case Perl cannot read/write the file.

open (my $INFILE, "<", "test data") or die;

Reading in your file

Once you've successfully opened your file, reading in your lines is rather simple. You would use a while loop to do so.

my @annovalue;
while ($INFILE) 
{                                 
        push (@annovalue, $_);
}

I've used the $_ variable, which is called the default value.

The while loop

Looking at your main code in the while loop, your program will not match anything as the variable $annovalue is not defined. Using use strict; and my will tell you that in the future. If you want to loop through every entry of an array, you will have to use a foreach loop.

open (my $OUTFILE, ">", "testData_out.txt");
foreach (@annovalue) 
{
         if ($_ ne "")
         {
               for my $feat_object ($seq_obj->get_SeqFeatures)
               {
                   if (($feat_object->primary_tag eq "CDS") && ($feat_object -> has_tag("protein_id")))
                   {
                         for my $val ($feat_object->get_tag_values("protein_id"))
                         {
                                foreach $val(@feat_object)
                                {
                                    if ($val=~ /$annovalue/)
                                    {
                                            $feat_object -> $set_tag_values("notes");
                                            $seq_obj = $seqio_obj -> $next_seq();
                                    }
                                }
                         }
                   }
               }
        }
print OUTFILE ($_);
}

I advise you to be constant with your style. You used my in your code sometimes, but it did not really have a purpose. I'm not entirely sure what's happening in your main body either, as it seems very confusing with all the if and for and foreach structures. Furthermore, you should be constant with your curly brackets. Either you open your block of statements with a curly bracket on the same line as the statement or structure or you start on the next line. Regardless of which you prefer, stick with it!

If you have two if statements referring to the same variable and following each other, you can use the && (and) operator to do so. There are more and you should look into this. They can be very handy and they make your code look tidier, more readable and you save a couple of lines.

Lastly, you print your results to a file which you haven't declared. The OUTFILE has not been opened and you need to do this, otherwise Perl will not know where to write to and discard your results.

open (my $OUTFILE, ">", "test data_out.txt") or die;

I hope my comments help.