Entering edit mode
2.8 years ago
mthm
▴
50
the headers in my multifasta files are of different formats and names, the only thing in common among all of them is the order of headers. I have provided a list of names that should be replaced according to the orders. I was thinking of using seqkit but I can't find the correct syntax for it
>EOG09150JA6_/storage/home/users/Dlittoralis_73_scf.fasta_jcf7180000720927_64871-66017 117 bp
MRRNNYPYQPLNQHPAPSGPAGHDALEAENERAAEELQQKIGALKSLTIDIGNEVRYQDK
LLRGIDDDMDRTGGFLGNTMTRVVRLAKQGGGSKQMCYMFLFVLFVFVLLWLTLKFK
>EOG09150JA6_/storage/home/users/Dlummei_81_scf.fasta_jcf7180000898911_4133-4655 117 bp
MRRNNYPYQPLNQHPAPSGQAGHDALEAENERAAEELQQKIGALKSLTIDIGNEVRYQDK
LLRGIDDDMDRTGGFLGNTMTRVVRLAKQGGGSKQMCYMFLFVLFVFVLLWLTLKFK
>dmoj37yC5.fa_scf7180000237413_9322-9672 117 bp
MRRNNYPYQPLNQHPAPSGPAGHDALEAENERAAEELQQKIGALKSLTIDIGNEVRYQDK
so far this is what I came up with
seqkit replace -p "^(\S+)"
but symbol -r "{kv}"
needs a tab delimited file, while in my case the headers are variable so I can only provide the new names based on order
you need to be more specific, what do you want to rename it as? Give a full example of one record, input/output.
Once tasks are more complicated writing a very simply Python script is usually the way to go.
In BioPython the solution would be like so