Question

splitting multifasta-file in python

0

Entering edit mode

6.2 years ago

twesigomwedavid • 0

Hello,

How can I split a multi-fasta file into individual sequence files in python?

python multi-fasta file splitting • 8.5k views

ADD COMMENT • link updated 3.2 years ago by lachiemck • 0 • written 6.2 years ago by twesigomwedavid • 0

3

Entering edit mode

Always show an attempt in your post to show you tried something first.

Some hints :

You can use Biopython but it can be slow if you have a huge file
Or read the file line by line in a for loop, for each ">" at the beginning of a line, create a new file and write the current line and the next one into it. (you can even use the header of each sequence as output file name)

I think you can even do that in one Unix command

ADD REPLY • link 6.2 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Okay, thank you Bastien

ADD REPLY • link 5.0 years ago by twesigomwedavid • 0

2

Entering edit mode

If this is a assignment you should always show the code your have written so far (if you need specific help).

Otherwise there are similar questions/solutions that can be found on this forum. Try doing an external google search.

ADD REPLY • link 6.2 years ago by GenoMax 147k

score 1 · Answer 1 · 2018-10-02

1

Entering edit mode

6.2 years ago

Joe 21k

As the others have said, see other results on this forum, for example: Split the multiple sequences file into a separate files

ADD COMMENT • link 6.2 years ago by Joe 21k

score 1 · Answer 2 · 2018-10-03

Try this code

#!/usr/bin/env python
import os
from Bio import SeqIO
def split(fastafile     =   "test_fasta.fasta",
          outfastadir   =   "splitoutput"):
    """Extract multiple sequence fasta file and write each sequence in separate file"""
    os.system("mkdir -p %s"% (outfastadir))
    with open (fastafile) as FH:
        record          =   SeqIO.parse(FH, "fasta")
        file_count      =   0
        for seq_rec in record:
            file_count  =   file_count  +   1
            with open("%s/%s.fasta" % (outfastadir,str(file_count)), "w") as FHO:
                SeqIO.write(seq_rec, FHO, "fasta")
    if file_count       == 0:
        raise Exception("No valid sequence in fasta file")
    return "Done"

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(version="1.0",
                                     description="Extract multiple sequence fasta file and write each sequence in separate file")

    parser.add_argument('-f','--fastafile',
                        action  ="store",
                        default ="test_fasta.fasta",
                        help="Fasta File for parsing")
    parser.add_argument('-d','--outfastadir',
                        action  ="store",
                        default ="splitoutput",
                        help    ="Fasta File output directory")

    args = parser.parse_args()
    split(fastafile     =   args.fastafile,
          outfastadir   =   args.outfastadir)