Hello
I've started using perl a couples of month ago.
I need to parse a fasta file with 2 sequences. I can read them and export in another txt without the header. Now, I need to compare both: their length, their amino acids.. but I dont know how to do that, this is my code:
my $fasta_file = "shortAligForDist.txt";
my $outfile = "Output.txt";
open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!";
my $fh;
open($fh, $fasta_file) or die "can't open $fasta_file: $!\n";
my %sequence_data;
while (read_fasta_sequence($fh, \%sequence_data)) {
#print ">$sequence_data{header}\n$sequence_data{seq}\n\n";
print "\n$sequence_data{seq}\n\n";
print OUTFILE "\n$sequence_data{seq}\n\n";
}
sub read_fasta_sequence {
my ($fh, $seq_info) = @_;
$seq_info->{seq} = undef; # clear out previous sequence
# put the header into place
$seq_info->{header} = $seq_info->{next_header} if $seq_info->{next_header};
my $file_not_empty = 0;
while (<$fh>) {
$file_not_empty = 1;
next if /^\s*$/; # skip blank lines
chomp;
if (/^>/) { # fasta header line
my $h = $_;
$h =~ s/^>//;
if ($seq_info->{header}) {
$seq_info->{next_header} = $h;
return $seq_info;
}
else { # first time through only
$seq_info->{header} = $h;
}
}
else {
s/\s+//; # remove any white space
$seq_info->{seq} .= $_;
}
}
if ($file_not_empty) {
return $seq_info;
}
else {
# clean everything up
$seq_info->{header} = $seq_info->{seq} = $seq_info->{next_header} = undef;
return;
}
}
Your question is underspecified. Unless you fully describe what you want to accomplish, people can only offer vague hints. Please try harder to describe the intended input and output of the program. "Compare the amino acids of two sequences" does not really mean anything.
Also, this smells like a homework assignment. The point of homework is to figure it out for yourself, not to ask other people for the answer.
A "homework" flag should exist to discourage answers to this kind of questions
Hi is not a homework assignment. I am a PhD student and I'm trying to do that in order to perfor a program that avoid me to compare sequences in order to make my analysis more efficiency.
Use BioPerl, unless you have a valid reason to avoid it. If you stick to plain text processing, you'll stumble a lot.