splitting multifasta-file in python
2
0
Entering edit mode
6.2 years ago

Hello,

How can I split a multi-fasta file into individual sequence files in python?

python multi-fasta file splitting • 8.5k views
ADD COMMENT
3
Entering edit mode

Always show an attempt in your post to show you tried something first.

Some hints :

  • You can use Biopython but it can be slow if you have a huge file

  • Or read the file line by line in a for loop, for each ">" at the beginning of a line, create a new file and write the current line and the next one into it. (you can even use the header of each sequence as output file name)

I think you can even do that in one Unix command

ADD REPLY
0
Entering edit mode

Okay, thank you Bastien

ADD REPLY
2
Entering edit mode

If this is a assignment you should always show the code your have written so far (if you need specific help).

Otherwise there are similar questions/solutions that can be found on this forum. Try doing an external google search.

ADD REPLY
1
Entering edit mode
6.2 years ago
Joe 21k

As the others have said, see other results on this forum, for example: Split the multiple sequences file into a separate files

ADD COMMENT
1
Entering edit mode
6.2 years ago
Siya Diya ▴ 10

Try this code

#!/usr/bin/env python
import os
from Bio import SeqIO
def split(fastafile     =   "test_fasta.fasta",
          outfastadir   =   "splitoutput"):
    """Extract multiple sequence fasta file and write each sequence in separate file"""
    os.system("mkdir -p %s"% (outfastadir))
    with open (fastafile) as FH:
        record          =   SeqIO.parse(FH, "fasta")
        file_count      =   0
        for seq_rec in record:
            file_count  =   file_count  +   1
            with open("%s/%s.fasta" % (outfastadir,str(file_count)), "w") as FHO:
                SeqIO.write(seq_rec, FHO, "fasta")
    if file_count       == 0:
        raise Exception("No valid sequence in fasta file")
    return "Done"

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(version="1.0",
                                     description="Extract multiple sequence fasta file and write each sequence in separate file")

    parser.add_argument('-f','--fastafile',
                        action  ="store",
                        default ="test_fasta.fasta",
                        help="Fasta File for parsing")
    parser.add_argument('-d','--outfastadir',
                        action  ="store",
                        default ="splitoutput",
                        help    ="Fasta File output directory")

    args = parser.parse_args()
    split(fastafile     =   args.fastafile,
          outfastadir   =   args.outfastadir)
ADD COMMENT
0
Entering edit mode

Not working unfortunately

ADD REPLY
0
Entering edit mode

You need to be more specific in the manner in which it is not working if you desire further help.

ADD REPLY
0
Entering edit mode

I had a similar problem, and got this from another user on here (a.zielezinski)

d={}
fh = open("sequence.fa", "r")
for seq_record in SeqIO.parse(fh, "fasta"):
     species_name = seq_record.id.split('-')[-1]
     if species_name not in d:
         d[species_name] = open(f"{species_name}.fa", "w")
     d[species_name].write(seq_record.format("fasta"))

fh.close()

Here's a link to my thread I got given it in Sorting and writing multifasta entries to new fasta files

ADD REPLY

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6