Concatenating fasta files based on prefix ?
0
2
Entering edit mode
3.6 years ago
sunnykevin97 ▴ 990

HI

In a directory, I had 9792 files from 1088 groups (1088 x 9 = 9792) each group has unique ID. I'm interested in concatenating only those files which matches group ID as a prefix.

OG000190,OG0012877,OG0012858 .... (1088) with a prefix = OG00. 

Able to do it for each group individually using cat. How do I automate the process using loops. I tried unable to reproduce, some help.

cat *OG0012884. > OG0012884_out.fa

OG0012884._1_1.txt.fa
OG0012884._1_2.txt.fa
OG0012884._1_3.txt.fa
OG0012884._1_4.txt.fa
OG0012884._1_5.txt.fa
OG0012884._1_6.txt.fa
OG0012884._1_7.txt.fa
OG0012884._1_8.txt.fa
OG0012884._1_9.txt.fa
aligment RNA DNA fasta • 1.3k views
ADD COMMENT
4
Entering edit mode

There's no way that cat *OG0012884. > OG0012884_out.fa produced anything but an empty file

If OG0012884 represents a group ID and all your files in that dir follow the same naming scheme then e.g.

for U in $(ls *.fa | cut -f1 -d "." | sort -u); do
    cat "$U"*.txt.fa > "$U".all.fa
done
ADD REPLY
0
Entering edit mode

I had all the files in one directory. I already tried in bash it concatenates all the *.fa into one. I'm not so familiar with regular expressions in bash.

ADD REPLY
0
Entering edit mode

Play with the components of the solution @5heikki gave you to troubleshoot and learn. Are you familiar with loops and pipes? The first part of the solution finds all the unique suffixes from the file names by listing all the files, cutting off the 1st field before the period, and sorting the results uniquely - this should give you your 1088 group IDs. You can check that part easily on the command line. The loop then uses each suffix grab the matching files and do what you want.

ADD REPLY
0
Entering edit mode

If each group has ._1_1.txt.fa and the pattern is exactly same as in OP, try this:

ls *1_1.txt.fa | while read line; do echo ${line%%\.*}; done

This would print each group and you concatenate each group files by using wild cards.

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6