Question

How to divide FASTA file?

0

Entering edit mode

7.4 years ago

l.souza ▴ 80

Hello!

I have a FASTA file with about 1600 sequences. But, I'm gonna use a tool that requires the FASTA files possess up to 200 sequences.

Is there a way to automate the division of this file using Linux or Windows?

Thanks in advance.

fasta sequences • 4.0k views

ADD COMMENT • link updated 7.4 years ago by GenoMax 151k • written 7.4 years ago by l.souza ▴ 80

score 1 · Answer 1 · 2018-01-07

If your FASTA file records are linear — one line for the header, one line for sequence — you can use split -l:

$ split -l 400 records.fa splitRecords_

200 two-line records will take up 400 lines. Split filenames will start with the prefix splitRecords_.

If you have multiline FASTA, then you can convert it to linear FASTA. Search Biostars on how to do this conversion step. Once converted to linear form, you can use split.

score 1 · Answer 2 · 2018-01-07

faSplit utility from Jim Kent at UCSC. Download and make executable (chmod a+x faSplit). Linux version linked, macOS available.

faSplit - Split an fa file into several files.
usage:
   faSplit how input.fa count outRoot
where how is either 'about' 'byname' 'base' 'gap' 'sequence' or 'size'.  
Files split by sequence will be broken at the nearest fa record boundary. 
Files split by base will be broken at any base.  
Files broken by size will be broken every count bases.

Examples:
   faSplit sequence estAll.fa 100 est
This will break up estAll.fa into 100 files
(numbered est001.fa est002.fa, ... est100.fa
Files will only be broken at fa record boundaries

and many other modes you can check by running faSplit.