Hi,
I have a fasta file with 300 protein sequences. I intend to construct a phylogenetic tree with it. I would want only the accession number and the organism name in the fasta header and remove the rest of the information. Can anybody suggest how to do this? I have a linux based system with perl and python installed.
For example, i want to convert a header like this:
Strictly speaking, yours is not a right FASTA. Anything following the first space/tab is not part of the sequence name. Renaming fasta like this may confuse other tools.
Strictly speaking, yours is not a right FASTA. Anything following the first space/tab is not part of the sequence name. Renaming fasta like this may confuse other tools.