Split multifasta file in individual sequence file
1
0
Entering edit mode
8.4 years ago
tcf.hcdg ▴ 70

Hello I would like to split multifasta file into the individual file for each sequence in the file. I used the following code and it worked fine with file up to 500 sequences. I tried the same code with 1500 sequences multifasta file. Unfortunately, It didn't work with this and I received the following error message. code I tried?

 awk -F '>' '/^>/ {F=sprintf("%s.fasta", $2); print > F;next;} {print F;}' < dt123_nbxcs.fa

error I received

awk: cannot open "gi|353013051|gb|JH237239.1|:7759-7979.fasta" for output (Too many open files)

I wonder how can I do it other than awk?

multifasta • 18k views
ADD COMMENT
1
Entering edit mode

What if you add close(F) after print F; in the final block, i.e.

awk -F '>' '/^>/ {F=sprintf("%s.fasta", $2); print > F;next;} {print F; close(F)}' < dt123_nbxcs.fa
ADD REPLY
0
Entering edit mode

Thanks , It worked What I understand from this it will close after writing the sequence into a new file instead of keeping it in memory. Is it right or it has some other meaning?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
8.4 years ago

This question has already been asked and answered here.

ADD COMMENT
2
Entering edit mode

seqkit split --by-id multi_fasta_file.fasta

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6