Hi all I have a fasta file that i want to extract just header of sequences. is there any perl code or some thing like this to do that? thanks a lot in advance
regards
Hi all I have a fasta file that i want to extract just header of sequences. is there any perl code or some thing like this to do that? thanks a lot in advance
regards
For perl code, you can visit http://www.bioperl.org/wiki/Main_Page. If you just want to extract the headers, on a Linux/Unix system, a simple grep "^>" myfile.fasta
should work.
Why so complicated? ;) Only the header in a fasta file contains >
so you can use grep
:
grep -e ">" my.fasta
or awk to remove the >
:
$ awk 'sub(/^>/, "")'
>aksdjfljfd
aksdjfljfd
$ awk 'sub(/^>/, "")' your_file.fasta > desired_headers.txt
https://thomas-cokelaer.info/blog/2011/05/awk-the-substr-command-to-select-a-substring/
Expression in perl would be basically the same as the grep above (m/^>/
).. There are easier 1-liner ways to do this, but this is a basic outline of the perl code that should be pretty readable.
#!/usr/bin/perl
open(FASTA, "<your.fa");
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
my $header = $_;
print "$header\n";
}
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
By "header", you mean everything after the ">"? Or just some part of everything after the ">"? Or including the ">"? It's important to be specific since a lot of people misunderstand "header".
I just want everything after the ">". and i have to say that i am not familiar with perl and i want a perl code to run. if possible help me. thanks a lot. regards
err, why don't you just post your code then?