Hi!
So I have a FASTA file containing sequences, I want to replace old FASTA headers with new ones, and the first step to do so is to match with the header names. It's the name I want the match with, so after the '>'. How do I do this? All sequences have headers somewhat like this:
>Halobacterium_salinarum
This is the part of the code where I find the headers:
while (my $line = <$IN>) { if ($line =~ /^>/) {
my $x = # Here I want to match with "Halobacterium_salinarum"
# and all the other different species names
I have tried for hours to find out in the right match characters. Is it "any word character": \w? I also want to save the old species name in a hash, then I should save it like this: (\w+) and finish with \s cause thats where the name ends, right?
Try the script form following article.
https://www.perlmonks.org/?node_id=975419
So, people still use Perl for Bioinformatics!
Probably using bioperl will ease your life: