combining fasta files
4
0
Entering edit mode
9.9 years ago

I have around 149 fasta files of mouse genes cds sequences in fasta.txt format. I have to combine them into a single file containing all the sequences and run it against a gene dataset that I downloaded from ensembl biomart. Is there any shortcut command line I can use in cmd to combine all of them or any way of doing them in less time? Any suggestions are appreciated

windows fasta notepad • 75k views
ADD COMMENT
7
Entering edit mode
9.9 years ago
rtliu ★ 2.2k

The cat command is available as Powershell alias on Windows 7 and above. Press the Windows logo key + R, then type powershell and press Enter. Supposed your fasta files are located on d:\data, just type:

cd d:\data
cat *.fasta.txt > d:\combined.fasta

For any serious bioinformatics analysis, learning linux command is a must.

ADD COMMENT
1
Entering edit mode
9.9 years ago

a) Create a new directory

b) Move all the fasta files into the new directory

c) Change directory to the new directory

d) try this command: cat * >> one_big_file.txt

ADD COMMENT
3
Entering edit mode

I suggest a slight change:

cat *.txt > big.fasta

If the file you create matches the pattern of the files you are concatenating, you can get into an infinite loop where the file you create is being concatenated to itself. I've done this before :)

ADD REPLY
0
Entering edit mode

This was very useful!

ADD REPLY
0
Entering edit mode

when this does not work you can use the below find . -maxdepth 1 -type f -name 'file_.pdb' -print0 | sort -zV | xargs -0 cat >all.pdb The find command finds all relevant files, then prints their pathnames out to sort that does a "version sort" to get them in the right order (if the numbers in the filenames had been zero-filled to a fixed width we would not have needed -V ). xargs takes this list of sorted pathnames and runs cat on these in as large batches as possible. This should work even if the filenames contains strange characters such as newlines and spaces. We use -print0 with find to give sort nul-terminated names to sort, and sort handles these using -z . xargs too reads nul-terminated names with its -0 flag. Note that I'm writing the result to a file whose name does not match the pattern file_.pdb .

ADD REPLY
0
Entering edit mode

Is not cat a linux command? I am working on Windows. Do you know the equivalent command for Windows?

ADD REPLY
0
Entering edit mode

"Any command-line or batch cmd to concatenate multiple files?" http://superuser.com/questions/111825/

ADD REPLY
1
Entering edit mode
9.9 years ago
biolab ★ 1.4k

You can achieve this under DOS. Make a new folder and move all your files in. Suppose all files end with .txt. Then run copy *.txt merged.txt

Also I would like to note that learning some linux basics (shell commands, grep, awk) will much benefit your work.

ADD COMMENT
0
Entering edit mode
9.9 years ago
ete ▴ 110

You can use one of my software for this task, BlasterQt.

Have a look at the format converter tab. Let me know if you need further help.

Cheers,
Stefanie

ADD COMMENT

Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6