I wanna extract the header line from 2 protein sequences for example: I created the file, then I gave the file name, then I wanna extract header, then from the second sequence.
here I showed what's the header in below:
gi|628601924|ref|NP_001278775.1| DNA-binding protein Ikaros isoform 16 [Homo sapiens] These are my protein fasta with heder that i wanna extract the header from them:
gi|628601924|ref|NP_001278775.1| DNA-binding protein Ikaros isoform 16 [Homo sapiens] MDADEGQDMSQVSGKESPPVSDTPDEGDEPMPIPEDLSTTSGGQQSSKSDRVVVTYGADDFRDFHAIIPK SFSLLEL
gi|628601926|ref|NP_001278776.1| DNA-binding protein Ikaros isoform 16 [Homo sapiens] MDADEGQDMSQVSGKESPPVSDTPDEGDEPMPIPEDLSTTSGGQQSSKSDRVVVTYGADDFRDFHAIIPK SFSLLEL
I wrote this command:
print "Please enter a file name: ";
$file = <>;
open INFILE, $file;
$line = <INFILE>;
while (defined($line = <INFILE>)){
}
chomp $line;
if ($line =~ /^>(S+)\s*(.*)$/;){
$id = $1; # what should I write here to activate these $1 and $2
$description = $2;
print substr($1 $2); #what else should I add here
}
} else {# I need to write some thing here, like this it's not header don't extract it
}
next; # is this enough for going to the next sequence
close INFILE;
If you just need the header line then you could
grep ">"
.. no need for PERL.Why only perl? Is this homework? BTW, this was answered many a times in this forum.
This sounds suspiciously homework-y.
thanks for your comments I edited my question.
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts to keep threads logically organized.The original question is still unanswered. Is this a homework question?
Hint: If you just need to print the header line then check to see if the line begins with "^>" otherwise go to next line.