Entering edit mode
3.7 years ago
sunnykevin97
▴
990
HI
In a directory, I had 9792 files from 1088 groups (1088 x 9 = 9792) each group has unique ID. I'm interested in concatenating only those files which matches group ID as a prefix.
OG000190,OG0012877,OG0012858 .... (1088) with a prefix = OG00.
Able to do it for each group individually using cat. How do I automate the process using loops. I tried unable to reproduce, some help.
cat *OG0012884. > OG0012884_out.fa
OG0012884._1_1.txt.fa
OG0012884._1_2.txt.fa
OG0012884._1_3.txt.fa
OG0012884._1_4.txt.fa
OG0012884._1_5.txt.fa
OG0012884._1_6.txt.fa
OG0012884._1_7.txt.fa
OG0012884._1_8.txt.fa
OG0012884._1_9.txt.fa
There's no way that
cat *OG0012884. > OG0012884_out.fa
produced anything but an empty fileIf
OG0012884
represents a group ID and all your files in that dir follow the same naming scheme then e.g.I had all the files in one directory. I already tried in bash it concatenates all the *.fa into one. I'm not so familiar with regular expressions in bash.
Play with the components of the solution @5heikki gave you to troubleshoot and learn. Are you familiar with loops and pipes? The first part of the solution finds all the unique suffixes from the file names by listing all the files, cutting off the 1st field before the period, and sorting the results uniquely - this should give you your 1088 group IDs. You can check that part easily on the command line. The loop then uses each suffix grab the matching files and do what you want.
If each group has
._1_1.txt.fa
and the pattern is exactly same as in OP, try this:This would print each group and you concatenate each group files by using wild cards.