Problem adding file names to fasta headers in multiple directories using awk
2
1
Entering edit mode
2.1 years ago
nitinra ▴ 50

Hello all,

I have about ~150 sub directories in a directory. Each of those subdirectories contains multiple fasta files. I want to rename all the fasta headers with adding the name of the file.

I tried using this command:

for i in ./*mg/*.faa; do gawk -i inplace '/>/{gsub(">","&"FILENAME"_");gsub(/\.faa/,x)}1' $i; done

This command however renames the fasta headers with sub-folder name (*mg) and the filename. How do I modify this command to just include the filename??

fasta awk bash • 1.1k views
ADD COMMENT
3
Entering edit mode
2.1 years ago
find /*mg -type f -name "*.faa" | while read F; do sed "/^>/s|\$| $(basename $F)|" "${F}" > "${F}.new" ; done
ADD COMMENT
3
Entering edit mode
2.1 years ago

GNU parallel and seqkit answer for posterity.

parallel 'seqkit replace -p "(.+)" -r "\$1 {/}" {} > {= s/\.faa/.renamed.faa/ =}' ::: $(find ./*mg -name "*.faa")
ADD COMMENT
1
Entering edit mode

If you get "Argument list too long" use:

find ./*mg -name "*.faa" | parallel 'seqkit replace -p "(.+)" -r "\$1 {/}" {} > {= s/\.faa/.renamed.faa/ =}'

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6