How to import fasta file by using perl DBI.?
1
1
Entering edit mode
9.7 years ago
Krithi.S ▴ 10
use DBI;
use strict;
use warning;

my $textfile = "fasta.txt";
open (TEXT,"$textfile") or die "Could not open:$!";

my $dbh = DBI->connect("dbi:mysql:database=file",user,password,{RaiseError=>1}) or die "Could not connect $DBI:errstr";

while(<TEXT>){
    $first = <TEXT>;
    @second = <TEXT>;
    $dbh->do("INSERT INTO open values (?,?)", undef, $first,@second);
    }

$dbh->disconnect;
close TEXT;

This is My Perl code. It is not insert data in database. I created database and table. database name is file and table name is open, the table contain two column one is header and another one is sequence.

My expectation. I will create database have two column, one column contain the first line of fasta file and second column contain their sequence.

Header                                                                                        Sequence
>sp|P00547|KHSE_ECOLI Homoserine kinase OS=Escherichia coli (strain K12) GN=thrB PE=1 SV=2    MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEP
                                                                                              RENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLND

Please help me..

Database DBI perl • 3.1k views
ADD COMMENT
2
Entering edit mode
9.7 years ago

First your code should not run: you 'use strict' and declare variables without 'my'. Second, you're not using placeholders correctly. Each ? has to map to a scalar so (?,?) requires two scalars and it looks like you have a scalar ($first) and a list (@second). However, using $dbh->do() in this way is inefficient because you create and destroy a statement handle at each iteration. A more efficient way to use placeholders is with prepare:

my $query = qq(INSERT INTO open values (?,?));
my $sth = $dbh->prepare($query);
while(<TEXT>){
    my $first = <TEXT>;
    my $second = <TEXT>; # get the next line of text only
    $sth->execute($first,$second);
}

Finally, you also need to parse your fasta file properly.

ADD COMMENT
0
Entering edit mode

@Jean-Karim Heriche, thanking you for your nice answer. But I am not able to getting Header in my output in one column? Only my output is have a sequence, but its to split into three rows. I want to expect in that header column only the first line and that sequence column contain all sequence in fasta file. Please help me.

ADD REPLY
0
Entering edit mode

This is what I meant by parsing your fasta file correctly. I suggest you use BioPerl's Bio::SeqIO for this. However, if you need an alternative, here is a function that does the job:

=head2 get_fasta_record

 Arg: opened filehandle
 Description: Reads next record from a fasta file
 Returntype: arrayref to (header, sequence)

=cut

sub get_fasta_record {
  my $fh = shift;
  local $/ = ">";
  my ($header,$sequence);
  while(my $line =<$fh>) {
    chomp($line);
    $line=~s/\R/\n/g; # replace any line break characters with \n
    next if ($line=~/^\s*$/);
    my $newline_pos = index($line,"\n");
    $header = substr($line,0,$newline_pos);
    $sequence = substr($line,$newline_pos+1);
    $sequence =~s/\n//g;
    $sequence =~s/\s+//g;
    last;
  }
  if (defined($header)) {
    return [$header,$sequence];
  }
  else { return undef; }
}

Use like this:

open my $fh,"<",$ARGV[0] or die "\nERROR: Can't read file $ARGV[0]: $!";
while (my ($header,$sequence) = @{$seq_reader->get_fasta_record($fh)}) {
  # Do something here
}
close $fh;
ADD REPLY

Login before adding your answer.

Traffic: 1083 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6