I'd recommend for i in *.fasta instead of for i in $(ls *.fasta) - the latter adds a sub-shell where a glob would suffice. Plus, ls can get unpredictable if customized and IIRC filenames can cause a problem with the ls sub-shell method too.
I have a heavily customized shell. My ls is an example. My LSCOLORS setting interferes with the filename here. See sample output:
➜ for f in $(ls *.gz)
> file $f
hs37d5_GRCm38p6.fasta.gz: cannot open `\033[0m\033[38;5;9mhs37d5_GRCm38p6.fasta.gz\033[0m' (No such file or directory)
hs37d5_GRCm38p6.reheader.fasta.gzip.gz: cannot open `\033[38;5;9mhs37d5_GRCm38p6.reheader.fasta.gzip.gz\033[0m' (No such file or directory)
hs37d5_GRCm38p6.reheader.fasta.gz: cannot open `\033[38;5;9mhs37d5_GRCm38p6.reheader.fasta.gz\033[0m' (No such file or directory)
: cannot open `\033[m' (No such file or directory)
➜ for f in $(/bin/ls *.gz)
> file $f
hs37d5_GRCm38p6.fasta.gz: gzip compressed data, extra field
hs37d5_GRCm38p6.reheader.fasta.gz: gzip compressed data, extra field
hs37d5_GRCm38p6.reheader.fasta.gzip.gz: gzip compressed data, from Unix, last modified: Mon May 6 16:15:43 2019
➜ for f in *.gz
> file $f
hs37d5_GRCm38p6.fasta.gz: gzip compressed data, extra field
hs37d5_GRCm38p6.reheader.fasta.gz: gzip compressed data, extra field
hs37d5_GRCm38p6.reheader.fasta.gzip.gz: gzip compressed data, from Unix, last modified: Mon May 6 16:15:43 2019
With respect to filenames causing a problem, if filenames contained white spaces, $(ls) would pass that as separate inputs whereas * would glob it as one with the spaces escaped. See below:
➜ touch a "b c"
➜ for f in $(/bin/ls *)
> file $f
a: empty
b: cannot open `b' (No such file or directory)
c: cannot open `c' (No such file or directory)
➜ for f in *
> file $f
a: empty
b c: empty
for i in *fasta; do n="${i%.fasta}"; sed -i.bak "s/>[^_]\+/>$n/" $i; done
This loops over all files in the current directory that end with "fasta". For each file:
n="${i%.fasta}" removes the .fasta file extension (can be generalized to any extension by using n="${i%.*}")
sed "s/>[^_]\+/>$n/" matches a string in the file that starts with ">" and is followed by any character that's not an underscore, and replaces it with the filename minus extension found in the previous step. Depending on your requirements, you may want to tighten up this regex.
The -i.bak part just tells sed to replace the string in the original file, but make a backup called <originalname>.bak.
I'd recommend
for i in *.fasta
instead offor i in $(ls *.fasta)
- the latter adds a sub-shell where a glob would suffice. Plus,ls
can get unpredictable if customized and IIRC filenames can cause a problem with the ls sub-shell method too.Thanks, RamRS!
Can you give some examples of this?
I have a heavily customized shell. My
ls
is an example. My LSCOLORS setting interferes with the filename here. See sample output:With respect to filenames causing a problem, if filenames contained white spaces,
$(ls)
would pass that as separate inputs whereas*
would glob it as one with the spaces escaped. See below:I see. Good points that I didn't think about. Thanks, RamRS.