Hi everyone,
I'm encountering a problem with too long fasta headers. They get truncated at the 20th position by a program (TargetP) I'm using.
Example:
>ConsensusfromContig10000-snap_masked-ConsensusfromContig10000-abinit-gene-0.1-mRNA-1:cds:3144/1451-1467:0:+
MKKSGDIDEIWKSMQEDARPKPRLPPLPAAAPPAPAPPAPAPKAAAAQPAAASSSNAMVAVNGGASRAFDYSNANALQRDINSLGDEALGTRKRAAERLEAVIVGAEGEAAEATVRALTGDLFKPLLKRFADPGEK
What remains are thousands of entries named "ConsensusfromContig1".
Is there any software or any script I can use to rename the headers in a way that they are 20 characters long and still able to get identified? I have only found scripts for truncating too long headers so far. The desired naming for the example would be something like 10000|3144/1451-1467:0 .
I would be grateful for any help provided.
Thanks a lot! Never imagined it could be done so easy. I used your command in the following way:
It worked like a charm. Big thanks again!