Question

Split multifasta file in individual sequence file

0

Entering edit mode

8.4 years ago

tcf.hcdg ▴ 70

Hello I would like to split multifasta file into the individual file for each sequence in the file. I used the following code and it worked fine with file up to 500 sequences. I tried the same code with 1500 sequences multifasta file. Unfortunately, It didn't work with this and I received the following error message. code I tried?

 awk -F '>' '/^>/ {F=sprintf("%s.fasta", $2); print > F;next;} {print F;}' < dt123_nbxcs.fa

error I received

awk: cannot open "gi|353013051|gb|JH237239.1|:7759-7979.fasta" for output (Too many open files)

I wonder how can I do it other than awk?

multifasta • 18k views

ADD COMMENT • link updated 23 months ago by Kyubong ▴ 20 • written 8.4 years ago by tcf.hcdg ▴ 70

1

Entering edit mode

What if you add close(F) after print F; in the final block, i.e.

awk -F '>' '/^>/ {F=sprintf("%s.fasta", $2); print > F;next;} {print F; close(F)}' < dt123_nbxcs.fa

ADD REPLY • link 8.4 years ago by 5heikki 11k

0

Entering edit mode

Thanks , It worked What I understand from this it will close after writing the sequence into a new file instead of keeping it in memory. Is it right or it has some other meaning?

ADD REPLY • link 8.4 years ago by tcf.hcdg ▴ 70

0

Entering edit mode

Pretty much

ADD REPLY • link 8.4 years ago by 5heikki 11k

score 1 · Answer 1 · 2016-06-13

1

Entering edit mode

8.4 years ago

Jean-Karim Heriche 27k

This question has already been asked and answered here.

ADD COMMENT • link 8.4 years ago by Jean-Karim Heriche 27k

2

Entering edit mode

seqkit split --by-id multi_fasta_file.fasta

ADD REPLY • link 23 months ago by Kyubong ▴ 20