Hey all, I am trying to create a concatenated .faa file for analysis from the UBA genome data set, I downloaded the tar and unpacked it and a folder 'bacteria' was created with sub-folders labeled UBA1330, etc. and in those subfolders are .faa files. What I need to do is concatenate all the protein files into one major file and add the genome ID (e.g. UBA1330) to the >faa_id so I can locate that protein in the correct genome if it is a hit. I am new to shell and have done the following script
for GENOME in 'ls bacteria/';
do
sed "s|.*_|>${GENOME}_|" bacteria/${GENOME}/${GENOME}.faa | cat >> bacteria_proteins.faa;
done
I recieve the following error:
sed: can't read bacteria/ls: No such file or directory
sed: can't read bacteria/ls: No such file or directory
sed: can't read bacteria.faa: No such file or directory
for some reason it isn't doing the 'ls bacteria/' command correctly and using bacteria as the {GENOME}, yet when I run:
$ls bacteria/
I get the correct output:
UBAXXXX UBAXXXXXX UBAXXXXXX etc.
I'm new to terminal and would love some input on what I am doing wrong. Thanks!
Awesome that worked, thanks!