simple question - although when I search google all i get is how to extract the actual sequence. anyone have a quick solution as how to read in a fasta file, and then extract all the ids in the same order they appear in the fasta file?
here is a snippet:
>gnl|TC-DB|P0A334 1.A.1.1.1 Voltage-gated potassium channel OS=Streptomyces lividans GN=kcsA PE=1 SV=1
and I only want the 'P0A344' part.
link to fasta file:
https://docs.google.com/file/d/0B0iDswLYaZ0zX1RJdGRrRUxiSEk/edit?usp=sharing
thanks!
If you don't mind keeping the ">" and easy grep would be a start. grep ">" input.fasta > headers.txt.
But some more information would be great. What did you find for extracting the sequence? Do you want to write a script doing it? Then it might still help to have a closer look on how to extract the sequence and change it to extract the header. Do you want the full header? And, as we currently are in discussion about that topic in another post: What did you already try?
hi - updated with a snippet of the fasta file. out of that snippet i only want the 'P0A334' part, and then repeat for the other sequences.
Note that the ">" was not visible in your original question. Lines beginning with that character are formatted as blockquotes at BioStar. You need to indent the line with 4 spaces (done for you) to display it properly.