Dear All,
I have a Fasta file and a List file. I need to extract sequences according to the IDs in each line of the List file, and output to a file. An examplified output result is shown below (Files out_1 and out_2). I wrote a perl script to do this job (useage: perl extract_seq.pl fastafile listfile
), but got error message : Use of uninitialized value in string eq at ...
Could you please point out my error and give me some suggesttions? Thank you very much!!
Fasta file
>a1
tggtacgtgtcagtcaacgtgggggggggggggg
>a2
cgtacgtgacgtacactacgtttttttttttttt
>a3
cgctagacgtataaaaatatatattataaaaaaaa
>a4
cccgcgcggcgcgcggcccccccccccccccccccc
>a5
tggtttttttttttttttttttttttttttttttttttt
>a6
cgtacgggggggggggggggggggggggggggggggg
>a7
cgctagacgtataaaaaaaaaaaaaaaaaaaaaaaaa
>a8
cccgcgcattattattatatattatattttattatat
List file
a1 a3 a4
a5 a6 a8
Ideal output files:
out_1
>a1
tggtacgtgtcagtcaacgtgggggggggggggg
>a3
cgctagacgtataaaaatatatattataaaaaaaa
>a4
cccgcgcggcgcgcggcccccccccccccccccccc
out_2
>a5
tggtttttttttttttttttttttttttttttttttttt
>a6
cgtacgggggggggggggggggggggggggggggggg
>a8
cccgcgcattattattatatattatattttattatat
Problematic Perl script
#!/usr/bin/perl
use strict;
use warnings;
#Firstly store Fasta sequences into a hash;
open SEQ, $ARGV[0];
my $id;
my %seq;
while (<SEQ>) {
chomp;
if (/^>(.+)/){
$id = $1;
}else{$seq{$id} .= $_;
}
}
close SEQ;
#Next open LIST file and generate output files according to IDs in each line;
open LIST, $ARGV[1];
my $i;
while (<LIST>){
$i ++;
my @a = (split /\t/, $_);
open OUT, ">> out_$i";
foreach (sort keys %seq){
if($a[0] eq $_){
print OUT ">$_\n$seq{$_}\n";
}elsif($a[1] eq $_){
print OUT ">$_\n$seq{$_}\n";
}elsif($a[2] eq $_){
print OUT ">$_\n$seq{$_}\n";
}
}
close OUT;
}
close LIST;
Don't write your own code for basic tasks that have been scripted by so many other people already. Just google your question and you will easily get working solutions
Exercising is always good. Why not to appreciate the effort made.
OK, I agree. I didn't think of this being an exercise. But sometimes I see people spending 3 days working on a buggy script for a task that can easily be accomplished with well established widely used tools. I think it is just as important to know what is already available.
Try writing simple bash loop: for each line split IDs & grep from FASTA file & redirect into specific file.
This site kind of is becoming as site for programming help. Learn debugging your code line for line and try to find the error yourself. A first semester student could see the bug and fix it. This is no scientific question but a programmatic one. If you can't programm, then I suggest to learn it first and dont copy your code in here for someone to help debug your code. Sorry for being rude (and honest)
while I don't disagree completely, I think in this case, it's a bioinformatics question and the OP has shown his code and what he has tried rather than just asking how to do it without any effort.