Hey guys,
I have a directory will a lot of fasta files, some single and some multi-fasta. Is there a way to copy only the multi-fasta files (i. e., those files that have more than one >) to another directory? Thanks!
Hey guys,
I have a directory will a lot of fasta files, some single and some multi-fasta. Is there a way to copy only the multi-fasta files (i. e., those files that have more than one >) to another directory? Thanks!
Here's a low-tech solution that seems to do the job (I haven't tested it extensively):
#!/bin/bash
# Usage:
# $ bash script.sh destination_folder
for file in ./*.fasta ; do
entries=$(grep -c ">" $file)
if [ $entries -gt 1 ]; then
mv "$file" $1
fi
done
If this is an especially critical step in a pipeline or something, you may want to consider a more robust approach (i.e. use an actual parser to check for multiple entries, use find
to pick up the files etc.)
If you have very big fasta's this will be somewhat slow as it reads the whole file to count the number of >
. It could terminate after finding just 2, but that's a more complicated task.
#!/bin/bash
# Usage:
# $ bash script.sh destination_folder /path/to/dir/*.fasta # replace fasta with whatever extension is relevant
for file in "${@:1:$#}" ; do
entries=$(grep -c ">" "$file")
if [ "$entries" -gt 1 ]; then
mv -v "$file" "$1"
fi
done
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
To add to my comments about robustness, it would probably be worth using
getopt
here to make the usage clearer, since passing wildcards to bash scripts is a bit clunky.