Entering edit mode
6.1 years ago
twesigomwedavid
•
0
Hello,
How can I split a multi-fasta file into individual sequence files in python?
Hello,
How can I split a multi-fasta file into individual sequence files in python?
As the others have said, see other results on this forum, for example: Split the multiple sequences file into a separate files
Try this code
#!/usr/bin/env python
import os
from Bio import SeqIO
def split(fastafile = "test_fasta.fasta",
outfastadir = "splitoutput"):
"""Extract multiple sequence fasta file and write each sequence in separate file"""
os.system("mkdir -p %s"% (outfastadir))
with open (fastafile) as FH:
record = SeqIO.parse(FH, "fasta")
file_count = 0
for seq_rec in record:
file_count = file_count + 1
with open("%s/%s.fasta" % (outfastadir,str(file_count)), "w") as FHO:
SeqIO.write(seq_rec, FHO, "fasta")
if file_count == 0:
raise Exception("No valid sequence in fasta file")
return "Done"
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(version="1.0",
description="Extract multiple sequence fasta file and write each sequence in separate file")
parser.add_argument('-f','--fastafile',
action ="store",
default ="test_fasta.fasta",
help="Fasta File for parsing")
parser.add_argument('-d','--outfastadir',
action ="store",
default ="splitoutput",
help ="Fasta File output directory")
args = parser.parse_args()
split(fastafile = args.fastafile,
outfastadir = args.outfastadir)
I had a similar problem, and got this from another user on here (a.zielezinski)
d={}
fh = open("sequence.fa", "r")
for seq_record in SeqIO.parse(fh, "fasta"):
species_name = seq_record.id.split('-')[-1]
if species_name not in d:
d[species_name] = open(f"{species_name}.fa", "w")
d[species_name].write(seq_record.format("fasta"))
fh.close()
Here's a link to my thread I got given it in Sorting and writing multifasta entries to new fasta files
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Always show an attempt in your post to show you tried something first.
Some hints :
You can use Biopython but it can be slow if you have a huge file
Or read the file line by line in a for loop, for each ">" at the beginning of a line, create a new file and write the current line and the next one into it. (you can even use the header of each sequence as output file name)
I think you can even do that in one Unix command
Okay, thank you Bastien
If this is a assignment you should always show the code your have written so far (if you need specific help).
Otherwise there are similar questions/solutions that can be found on this forum. Try doing an external google search.