Entering edit mode
6.1 years ago
Mimmi Ahlmén
▴
30
I have a file which looks like this:
>orf_1 364-2160
atggcaggagcccgtggtctttggcgtgcaacagggatgaaagatactgattttggtaaa
cccattattgctattgcgaattcttttacacaatttgtaccagggcatgtccatttaaaa
.....
>orf_2 2453-5197
atggataataaagccacccataaaaataatttaatcccacaggcccccccttcatctgct
cctcatcaacagcgcctcacacctatgatggagcaatatatagaaatcaaagcagtgaat
.....
and a script that is supposed to insert the file content into a table:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::Index::Fasta;
use DBI; # The perl database module
my $NucleotideFasta = $ARGV[0];
my $ProteinFasta = $ARGV[1];
die "Give the name of existing nucleotide fasta file and a file name for protein translations to be saved." unless (defined($NucleotideFasta) && defined($ProteinFasta));
# Replace with your mysql details
my $database = "my_database.db"; #Change accordingly
# Create a connection to the database
my $dbh = DBI->connect("DBI:SQLite:$database","","") or die "Could not connect to the database $database.\n";
my $table_orfs = 'orfs'; #Change accordingly
# Insert genes, make sure to use the correct table names etc = same as when you created the database
my $sth_insert = $dbh->prepare("INSERT INTO $table_orfs (orf_name,sequence,length,protein) VALUES (?,?,?,?)");
#read in nucleotide sequences
my $seq_nucleotide = Bio::SeqIO->new( -format => 'fasta' , -file => $NucleotideFasta);
my $seq_protein = Bio::SeqIO->new( -format => 'fasta' , -file => ">$ProteinFasta");
while (my $seq = $seq_nucleotide->next_seq() ) {
#save protein translations to a file
my $protein=$seq->translate();
$seq_protein->write_seq($protein);
#prepare for database input
my $orf_id = $seq->display_id();
my $nucleotides = $seq->seq();
my $aminoacids = $protein->seq();
my $length = length($aminoacids);
#print "$orf_id $length\n$nucleotides\n$aminoacids\n";
$sth_insert->execute($orf_id,$nucleotides,$length,$aminoacids);
}
$dbh->disconnect();
My question is now, how can I create the table before using the script and which command should I use to run the script?
I know how to create a table, but in order for the script to work properly, what names in the tables should I use? Is it like this:
create table orfs( orf_name TEXT, sequence TEXT, length REAL, aminoacids TEXT );
When I do this I get:
DBD::SQLite::db prepare failed: table orfs has no column named orf_name at Gene2SQLite.pl line 21.
Can't call method "execute" on an undefined value at Gene2SQLite.pl line 37, <GEN0> line 1
. Maybe I misunderstood the task or am I supposed to create other names for the table?
what's the output of
?
CREATE TABLE orfs( orf_name TEXT, sequence TEXT, length REAL, protein TEXT );
When I look at the file my_database.db it looks like a mess.