My goal is to create a 2d dictionary, search for some sequence ids, and then writes the organism name with the amino acid sequence to a file
I have a working code that creates one dictionary, and looks for the id numbers that I want, and writes it to a file. I am unable to iterate through all my files, but it only returns the id number. I was looking to return the organism name as the key.
I have seen many examples on how to parse a single file as a dictionary to retrieve a dictionary as id:seq. Another example I seen seems to turn the header into a string, then split, but I am unsuccessful. Some of my headers have commas, and some do not. The examples I seen were splitting on ("|")
>EFE00375.1 S-adenosylmethionine-dependent methyltransferase, YraL family [Lactobacillus crispatus 214-1]
My command python master_lacto_dict.py L_214.txt P_1_Results.txt P_1_Clustal.txt
import sys
from Bio import SeqIO
aa_db_file = sys.argv[1] # Amino Acid Database file ~ 17 files
accession_id_file = sys.argv[2] # Accession IDs file ~ 18 accession id numbers
file_for_clustal = sys.argv[3] # Output fasta file
wanted = set()
with open(accession_id_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_database = SeqIO.parse(open(aa_db_file),'fasta')
#fasta_database = Seq.IO.index("file_name", "fasta") Also seen this in many examples
with open(file_for_clustal, "w") as f:
for seq in fasta_database:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
#Desired output
#crispatus 214-1:seq
Hi, could you please specify your concrete question here (it seems there are a couple of them)?
Besides, as your question is rather of a "programming" nature, I would recommend to search for your problem on Google and especially at Stackoverflow (kind of biostars for programmers). From my experience, there is no question that has not yet been asked there, so if you can specify your concrete problem you should find a solution there.
Best,
Cindy
Well, if the desired information is always hidden inside these brackets and you can make sure that these are the only occurrences of those brackets, why don't you search for the occurrences of "[" and "]" to retrieve start and end index, and then get the corresponding substring of rec.description?
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Hello,
I know have a for loop that seems to work
Output:
How can I parse the
recs
? I want what is inside those brackets?the re module is very useful in this case. Now I am curious how to iterates through my results files, which is basically a handful of text files, with one column of data.
How could I iterate through these files and check my master dictionary??
I was thinking about the glob function, but maybe there is an easier method