Hi all,
I've been trying to run a sequence alignment using clustal omega from within a python script. However, I keep getting the following error:
FATAL: Sequence file contains no data
However, if I run the raw command on the command line, everything runs fine. Here is the relevant code in my script:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
import Bio.SeqIO as SeqIO
from Bio.Align.Applications import ClustalOmegaCommandline
ref_seq = "ERVVIGSKPFNEQYILANMIAILLEENGYKA"
target_seq = "ERVVIGSKPFNEQYILANMINGYKA"
in_file = "unaligned.fasta"
out_file = "aligned.fasta"
# Write my sequences to a fasta file
handle = open(in_file, 'w')
records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) )
SeqIO.write(records, handle, "fasta")
clustalomega_cline = ClustalOmegaCommandline(infile=in_file, outfile=out_file, verbose=True, auto=True, force=True)
clustalomega_cline()
#sp.call(str(clustalomega_cline), shell=True) # This will also fail with the same error
Edit:
It seems that this is due to either the way biopython writes the FASTA file, or how clustalo processes it.
If I comment out the lines where I write to a file and run the script again, is still fails, even if I run the raw command using subprocess. BUT, if I also make sure that there are no spaces in the FASTA sequence headers, then the commands at the bottom run fine.
I don't know why this isn't a problem when just run from the command line with the headers unmodified. This doesn't make any sense.
I had a "workaround" that inadvertently did this. But now it's obvious. Thanks.