Hi all, I'm trying to work out a quick script to extract a set of sequence fasta files from a multifasta and write them all to a new, single fasta file. To elaborate, I've got a proteome, and I want to extract a group of 15 or so proteins associated with a certain process, and write them to a new multifasta. To do so I want the script to read a document with a list of sequence names in it and sort the original multifasta using that list. I'm aiming to do this using Biopython.
This is my code so far:
from Bio import SeqIO
import sys
sample_file = open(str(sys.argv[2]), "r")
seq_list = []
outfile = open(str(sys.argv[3]), "w")
#This reads the guide document and turns each line into a list item in seq_list.
for line in sample_file:
stripped_line = line.strip()
line_list = stripped_line.split()
seq_list.append(line_list)
sample_file.close()
print(seq_list)
#This print function is to confirm that seq_list is indeed storing the names.
for record in SeqIO.parse(str(sys.argv[1]), "fasta"):
for n in seq_list:
if n == record.id:
SeqIO.write(record, outfile, "fasta")
outfile.close()
The main problem so far is that I can load the document of names into seq_list and print the list, but parsing SeqIO with it doesn't seem to do anything. However, hardcoding the names into the code seemed to work fine. Any help would be greatly appreciated.
Thanks, Lachlan
the last line of SeqIO.write(record.....'fasta') in a for loop will input the the last record into new fasta file and it will only contain the one sequence or none considering the conditions. the best option would be using 'a' while opening up the file:
outfile = open(str(sys.argv[3]), "a")
or, if you want to use regular 'w' mode while opening the file: then,