How to concatenate .fa files from different directory
2
0
Entering edit mode
5.3 years ago
Kumar ▴ 170

I am looking to concatenate .fa files which are in different directories, how I can copy these files and concatenate in a single file.

unix help • 2.1k views
ADD COMMENT
0
Entering edit mode
5.3 years ago
Eric Lim ★ 2.2k
(base) [~/Data/scratch/tmp]$ tree 
.
├── dir_1
│   └── 1.fa
└── dir_2
    └── 2.fa
(base) [~/Data/scratch/tmp]$ cat `find . -name '*.fa' -print` > all.fa
(base) [~/Data/scratch/tmp]$ cat all.fa
>2
gtcg
>1
acgt
ADD COMMENT
0
Entering edit mode
5.3 years ago
GenoMax 147k

If you have a directory structure like following (or any relative/full directory paths)

top_dir -----|---- dir_1/*.fq
             |-----dir_2/*.fq
             |-----dir_3/*.fq

you could do something like:

cat top_dir/dir_1/*.fq top_dir/dir_2/*.fq top_dir/dir_3/*.fq > dest_dir/total.fq
ADD COMMENT
0
Entering edit mode

It seems to be a long path as I have files in following directories.

/home/tree2/L.Genomes/L.denovo/.spades.assembly/.prokka/*fa

ADD REPLY
0
Entering edit mode

I just used and it works--

cat *.spades.assembly/*.prokka/*.fasta > /home/master/annotation/new-marge.fasta

Thanks for you help!!

ADD REPLY
0
Entering edit mode

Hello,

I wanted to do a similar task and tried the following code, it worked. I wanted to concatenate 3 files with the same name but were in 3 different sub-directories.I have to repeat this for 600+ files so can you help me to put this into a loop. I'm running these in cluster environment.

cat pool1_dataset/COMP99001.FNA pool2_dataset/COMP99001.FNA pool3_dataset/COMP99001.FNA > con_COMP99001.fasta

Thanks!

ADD REPLY
0
Entering edit mode

You could do something like:

$ ls -R
files   test1   test2   test3

./test1:
file1.fq    file2.fq

./test2:
file1.fq    file2.fq

./test3:
file1.fq    file2.fq

Then do:

$ cd test1
$ ls -1 *.fq > ../files
$ for i in `cat ./files`; do echo cat test1/${i} test2/${i} test3/${i} > ${i}_total.fa; done
ADD REPLY
0
Entering edit mode

Hi Genomax, I tried your code exactly as it is. But it gave me "no such file or directory" error. I can't copy paste my code so I just typed the code exactly how I had it.

$ ls -R

.: files test1 test2 test3

./test1: file1.FNA file2.FNA file3.FNA

./test2: file1.FNA file2.FNA file3.FNA

./test3: file1.FNA file2.FNA file3.FNA

$cd test1

$ls -1 *.FNA > ../files

$for i in cat ./files; do echo cat test1/$.........................................; done

cat: ./files: No such file or directory

Then when I did cat files, it had my 3 files listed.

$cat files

file1.FNA

file2.FNA

file3.FNA

Is this a problem with my .FNA file format. All my files are in this format.

ADD REPLY
0
Entering edit mode

Of course it did. That is just an example. You need to replace your own directory/file names in the right places.

ADD REPLY
0
Entering edit mode

I replaced my directory names and file names :) I just used file1 etc. for easiness now. I want to send a snapshot of what I ran, but this doesn't let me attach an image.

But, I'm not sure what you meant by "files" and with what I have to replace it. to do a test run, I made 3 directories, test 1-3 and had file1.FNA, file2.FNA and file3.FNA files in those 3 directories. I also made a directory called files. IS it necessary?

Then when I did ls -R, it listed, as I showed in my above reply. (files test1 test2 test3 )

After all, when I did cat files, it listed my 3 FNA files as below, so you know I replaced my file names.

COMP100028.FNA

COMP100047.FNA

COMP100074.FNA

ADD REPLY
0
Entering edit mode

I see. I think the problem is you had only one dot before the name files which would indicate current directory.

$ for i in `**cat ./files**`; do echo cat test1/$.........................................; done

You need to have two since you put the files file one level up. So you should move up one level by doing cd ../ and then use the same command there.

$ for i in `cat ./files`; do echo cat test1/$.........................................; done

To answer your question. I just grabbed the names of the sample files into a new file called files. If that was confusing then replace the name with something more descriptive like my_data_files.

ADD REPLY
0
Entering edit mode

Thanks!! I will try this out.

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6