Hello BioStars community,
New to perl and programming in general, so I thought I might try out my luck asking a question here.
I am trying to match a fairly conserved protein sequence to a proteome using a regex. I am able to output the matching lines, as well as their positions, but I cannot find a way to output the accession numbers along with lines that match my conserved protein.
Here's part of my code:
my $proteins;
open( file, "Athaliana_167_protein.fa" ) or die "can't open file!";
while (<file>){
if (/W[S]TRRKIAI/) {print}
}
Would using lookahead/lookbehinds possibly work to print out the match line and accession number?
Thanks!
Your code does most likely not work for finding the sequence you are looking for, most fasta files contain linebreaks in the sequence where you will miss your pattern in case it is wrapped, you need to put the whole sequence into one string first.