Only copy multifasta files from one directory to another
1
0
Entering edit mode
4.2 years ago

Hey guys,

I have a directory will a lot of fasta files, some single and some multi-fasta. Is there a way to copy only the multi-fasta files (i. e., those files that have more than one >) to another directory? Thanks!

sequence assembly • 876 views
ADD COMMENT
2
Entering edit mode
4.2 years ago
Joe 21k

Here's a low-tech solution that seems to do the job (I haven't tested it extensively):

#!/bin/bash
# Usage:
#   $ bash script.sh destination_folder
for file in ./*.fasta ; do
    entries=$(grep -c ">" $file)
    if [ $entries -gt 1 ]; then
        mv "$file" $1
    fi
done

If this is an especially critical step in a pipeline or something, you may want to consider a more robust approach (i.e. use an actual parser to check for multiple entries, use find to pick up the files etc.)

If you have very big fasta's this will be somewhat slow as it reads the whole file to count the number of >. It could terminate after finding just 2, but that's a more complicated task.


EDIT: slightly more flexible way to ingest from the commandline:

#!/bin/bash
# Usage:
#      $ bash script.sh destination_folder /path/to/dir/*.fasta   # replace fasta with whatever extension is relevant

for file in "${@:1:$#}" ; do
    entries=$(grep -c ">" "$file")
    if [ "$entries" -gt 1 ]; then
        mv -v "$file" "$1"
    fi
done
ADD COMMENT
0
Entering edit mode

To add to my comments about robustness, it would probably be worth using getopt here to make the usage clearer, since passing wildcards to bash scripts is a bit clunky.

ADD REPLY

Login before adding your answer.

Traffic: 2444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6