I know a similar question was asked previously, but the "vanilla" perl code (which works beautifully if you want to extract ONE sequence, doesn't extract what I need (about 100 fasta sequences by ID from a big file containing 3000 total FASTA files concatenated in one file). Can anyone help? Perhaps it is a simple modification? Here is the code that was posted, I just don't know how to change it so I can input more than one ID (t currently fails if you put more than one). Many thanks to you knowledgeable folks out there!
(usage: extractSeqByID.pl SEQ123 < huge.fsa > my.fsa)
use warnings;
use strict;
my $lookup = shift @ARGV; # ID to extract
local $/ = "\n>"; # read by FASTA record
while (my $seq = <>) {
chomp $seq;
my ($id) = $seq =~ /^>*(\S+)/; # parse ID as first word in FASTA header
if ($id eq $lookup) {
$seq =~ s/^>*.+\n//; # remove FASTA header
$seq =~ s/\n//g; # remove endlines
print "$seq\n";
last;
}
}
Hi everyone.
i got a fasta file of 10 sequences. now i wanna write a python code that reads each sequence and parse it to RNAfold.exe for a secondary structure prediction. then, for each structure, print out some features like for example, number of G:U wobble pairs,CpG occurence and G+C content. can anybody help me out?i need your help becoz i am damn soo "good" at programming. thnaks a lot
If you have really MANY sequences (thousands or millions) it's worth to put them into database. I recommend SQLite (http://www.sqlite.org/) - query is extremely rapid, db is stored in just one file, and there are bindings for many languages (very good for Python for instance).