Entering edit mode
4.3 years ago
genomes_and_MGEs
▴
10
Hey everyone, When you donwload a given assembly from Refseq NCBI, the filename with be for example GCF_006351845.1_ASM635184v1_genomic.fna and the corresponding fasta header
>NZ_CP040904.1 Enterococcus faecium strain N56454 chromosome, complete genome
After some formatting, all my fasta headers are like this, for example:
>NZ_CP040904.1_Ef
I would like to rename my filename like this Ef_GCF_006351845.1_ASM635184v1_genomic.fna. So, copying the text after the last underscore on the fasta header, and moving it to the beginning of the filename.
Could you guys help me out?
Thanks!
Here's some logic to approach the problem:
For each of these files, you should pick the first line, cut out the second part where each part is separated by
_
and store that part in a variable. Now you should rename the file so this variable precedes the actual file name. This can be done in a loop that contains two commands. bash should do this, you won't need any programming language.I'm no expert in this, but I wrote this
I understand it should be something similar to this, but I'm making some mistakes. Could you help me out?
Change
cp
tomv
to rename instead of copy.