So I have a FASTA file containing sequences, I want to replace old FASTA headers with new ones, and the first step to do so is to match with the header names. It's the name I want the match with, so after the '>'. How do I do this? All sequences have headers somewhat like this:
This is the part of the code where I find the headers:
while (my $line = <$IN>) { if ($line =~ /^>/) {
my $x = # Here I want to match with "Halobacterium_salinarum"
# and all the other different species names
I have tried for hours to find out in the right match characters. Is it "any word character": \w? I also want to save the old species name in a hash, then I should save it like this: (\w+) and finish with \s cause thats where the name ends, right?
Try the script form following article.
So, people still use Perl for Bioinformatics!
Probably using bioperl will ease your life: