Entering edit mode
5.7 years ago
genomes_and_MGEs
▴
10
Hey guys,
I have a multi-fasta file containing several extracted regions, such as
>NZ_KI973281.1_1234..56789
atattgagctaaaaaaatcagttttccca...
>NZ_LAAL01000032.1_5456..32476
tgcagaagtaagggggtaacaccatgcct...
...
I would like to include strain name on fasta header, such as
>Enterobacter_sp._MGH_6_NZ_KI973281.1_1234..56789
atattgagctaaaaaaatcagttttccca...
>Enterobacter_hormaechei_subsp._xiangfangensis_strain_34984_NZ_LAAL01000032.1_5456..32476
tgcagaagtaagggggtaacaccatgcct...
...
Could you please help me out? Thanks!
See the following post
Renaming Entries In A Fasta File
and many others on its right panel,
like these ones: Rename fasta headers,
How to move the last 4 characters of all FASTA headers to the beginning?,
Renaming fasta file headers, etc.
There are many awk- or sed-scripts mentioned inside,
they may give you some hints.
Where are the strain names coming from? A separate file/NCBI search?
From simple NCBI search! I don't have a separate file with the corresponding strain name for each accession... And the suggested links can't help me on this issue. Can you help me out? Thanks!
The following will get you part way there.
Step 1: Look up names of the organisms in your blast result. (following work with the small snippet example above)
names.txt
now contains the names of the organisms.Step 2: Use one of the solutions in Renaming fasta headers according to a matching name list to do the replacements. There is small issue though.
names.txt
does not contain the version number for the accession so the solutions may need to be changed to suit your needs.