Merge multiple sequence files into one file
4
0
Entering edit mode
2.8 years ago
zhichusun ▴ 10

I would like to ask how to merge different fasta files into one fasta file and name each contig according to the name of the original fasta file?

For example, gene1.fasta, gene2.fasta, and gene3.fasta are merged into gene.fasta, and the contig names in gene.fasta are >gene1, >gene2, >gene3.

fasta merge • 1.5k views
ADD COMMENT
3
Entering edit mode
2.8 years ago

The key step is renaming the sequence of each file with the gene name in the file name. seqkit is used for renaming.

cat gene1.fasta 
>seq1
aaa

for f in *.fasta; do \
    gene=${f%.fasta}; \
    seqkit replace -p ".+" -r $gene $f; \
done > result.fa

cat result.fa 
>gene1
aaa
>gene2
ccc

For Windows users:

brename -q -l -p "fasta$" | rush -k "seqkit replace -p .+ -r {%:} {}" > result.fa

Where brename is for finding and listing files, rush is for batch operation and {%:} is for extracting the gene name.

ADD COMMENT
2
Entering edit mode
2.8 years ago

what is inside each fasta file? Can you see if this works for you:

with awk:

$ awk '/^>/{sub(".*",">"FILENAME)}1' *.fa | sed -r '/^>/ s/\.fa$//'

with parallel:

$ parallel seqkit replace -p '.+' -r {.} {} ::: *.fa

Replace .fa with appropriate file extension.

ADD COMMENT
0
Entering edit mode

This works, thanks a lot

ADD REPLY
1
Entering edit mode
2.8 years ago
M__ ▴ 200

Using Linux/Unix (e.g. OSX)

cat gene1.fasta gene2.fasta gene3.fasta > gene.fasta

If you've loads of files a bash for loop is more convenient rather than typing out the files individually

ADD COMMENT
1
Entering edit mode
2.8 years ago
Mensur Dlakic ★ 28k

This also works:

cat gene?.fasta > gene.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6