Hi guys,
I have searched quite extensively for this, to no avail.
I have no problem extracting Gene_ids from a genbank file when given a list of gene_ids to extract:
for seq_record in SeqIO.parse(gbk_file, "genbank"):
for feat in seq_record.features:
if feat.type == "CDS":
for gene in genes:
if gene in feat.qualifiers['gene_id']:
seq = feat.qualifiers['translation']
print(gene)
print(seq)
Where "gene in genes" is a variable storing each desired Gene_id in a list.
However
I would like to provide a start gene, and an end gene as my range so I can extract a Gene Cluster in fasta format. Is this possible?
I will be looping through the start and end gene_id's in a for loop as follows:
for x, y, z in zip(start_list, end_list, cluster_list):
Once the gene_id range is captured (along with its Amino Acid sequence), I will output them to a unique file using:
with open("%s" % (z), "w") as outfile:
outfile.write(">%s\n%s" % (gene, sequence))
Any help in capturing a range of gene_ids would be greatly appreciated.
Would it be possible to use start and end coordinates range instead of gene IDs to extract genes within a particular slice?
yep thats my plan, use the gene_id to then capture the corresponding location :)
Check out my answer to jrj.healey