I am trying to write a dictionary object to a FASTA file however I have problems with writing it.
I could not achieve doing it without using the library or with the library (Biopython).
I tried converting my dictionary to list using "dict.items()" then writing it with SeqIOand error is:
"AttributeError: 'tuple' object has no attribute 'id'"
I would appreciate any kind of help.
Thanks in advance!
The error is clear. Your dictionary is missing the id attribute, which is the required parameter to use SeqIO.write. Typically, you'd provide a SeqRecord object to it, which includes a Seq object with parameters like id and description. You can definitely turn your dictionary into a list of SeqRecord objects and everything else should work.
There are quite a few other ways to convert dictionary to the many formats SeqIO supports. The easiest (and the least programming experience required) is to simply write your dictionary into a tab-delimited file and use SeqIO.convert.
See below for an example.
from Bio import SeqIO
a ={'myseq1':'acgt', 'myseq2':'gctc'}# try writing your own code to turn this dictionary into a tab-delimited file (seq.tab), i.e# myseq1 acgt# myseq2 gctc
SeqIO.convert('seq.tab', 'tab', 'seq.fa', 'fasta')
from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('seqs.fa', 'fasta'))
with open('output_fasta.fa', 'w') as handle:
SeqIO.write(record_dict.values(), handle, 'fasta')
Yields this output file (output_fasta.fa): - spoiler alert, it's the same as the input file (duh!) :)
This is with the assumption that the OP obtained the dictionary directly from SeqIO where the dictionary's values() already has everything formatted nicely in SeqRecord. The attribute error suggested that the dictionary is likely from outside of the biopython's ecosystem. Your example is extremely educational for someone to learn how to construct SeqRecord objects from any key-value data structure by looking at the output of print(record_dict).
Thank you for your answers however it seems to be that the problem is a little bit more complicated to solve hence I've been trying just to get the fasta output of my file for 2 days. I'm new to python but It took a lot more to solve this single problem than writing the whole code.
I'm sorry if i could not be more specific and clear. To start, a piece from my code is below:
for seq_record in SeqIO.parse("/home/june/Desktop/snp/ssr/transcriptome.fasta", "fasta"):
if seq_record.id in str(dicti.keys()).strip(">"):
dicti[">" + seq_record.id]= dicti[">" + seq_record.id] + str(seq_record.seq)
In the end I get my output in the format of:
Transcript345: Motif: GC Total length: 20 CAAGGTCAGGCCTTCTTTATGCATGATAAGCACTGTGAGGACCCAGGGCAGCTTCAGTGATCATCAGGTGAGTTTAAGGTGGGGGGGGGGGGGCT
However I need it in the FASTA format.
And my dictionary is in the format of:
'>Transcript345': 'Motif: GC Total length: 20 CAAGGTCAGGCCTTCTTTATGCATG... '
#transcript id as the key and the rest as the value
Please ask me questions if i still am not clear enough.
You should add a reply to your original question instead of replying as an answer.
And no, I'm not sure I follow what exactly you're trying to do and I'm quite certain str(dicti.keys()).strip(">")) isn't doing what you think it's doing.
Your code hinted you're trying to output a subset of transcriptome.fasta whose record ids match to your dictionary keys, but It's probably more helpful if you can post an example of your dictionary, a line or two of your input, and your desired output.
This is with the assumption that the OP obtained the dictionary directly from
SeqIO
where the dictionary'svalues()
already has everything formatted nicely inSeqRecord
. The attribute error suggested that the dictionary is likely from outside of the biopython's ecosystem. Your example is extremely educational for someone to learn how to construct SeqRecord objects from any key-value data structure by looking at the output ofprint(record_dict)
.