I worked out another solution using a combination of AWK, SED and Perl. This solution works for single files, where each file has one header and the goal is to replace the header with a modified version of the file name.
Assuming all your fasta files in the current directory end in ".fa", run the following:
ls -lrt \|
grep ".fa$" \|
awk '{OFS="";print "var=",$NF"\nsed -i'.ori' -e \"s/>.*/>$var/g\" "$NF}' \|
perl -lne 'if ($_=~/^(var=.*)\.fa/) {print $1} else {print}' > run_command.sh
Explanation:
The first line ls -lrt
does a list of the files in your dir.
The second line keeps only those lines of your list that end in .fa
The third and fourth lines, this one gets a bit tricky, are an AWK and Perl commands. The AWK command that prints what the command will look like. In this case, there are two lines of code that will be printed. The first one assigns the name to a variable called var
and the second part of the awk command replaces the line with >
in your original fasta file with the name of the file: for example if the file name is file1.fasta
is the fasta header will be >file1
. The Perl command cleans things up.
The difference of this solution with others is that it prints the commands that are going to execute the name change into a file. In this way, it is possible to: i) make sure the names are changing for what you really want; ii) you keep a record of what happened. After you have checked that the name changes in the file `run_command.sh
are what you want, you can just:
bash run_command.sh
and it should do the trick.
NB: mind the -i'.ori'
option. This is specific for macOS as it won't overwrite the file as other GNU distributions of sed
would do. Ah, and if you're copy-pasting the command, you may want to delete the \
characters before each |
.
This solution is far from elegant, but works.
Ohh thank you so much that worked!!!!
For future reference, code can be further shorted by: