For example, is it possible to get the organism name as one gets the sequence and gene id for example:
>sp|Q09305|AAR2_CAEEL Protein AAR2 homolog OS=Caenorhabditis elegans GN=F10B5.2 PE=3 SV=1
MGGALPPEIVDYMYRNGAFLLFLGFPQASEFGIDYKSWKTGEKFMGLKMIPPGVHFVYCS
IKSAPRIGFFHNFKAGEILVKKWNTESETFEDEEVPTDQISEKKRQLKNMDSSLAPYPYE
NYRSWYGLTDFITADTVERIHPILGRITSQAELVSLETEFMENAEKEHKDSHFRNRVDRE
>sp|Q18007|ACM1_CAEEL Probable muscarinic acetylcholine receptor gar-1 OS=Caenorhabditis elegans GN=gar-1 PE=2 SV=3
MPNYTVPPDPADTSWDSPYSIPVQIVVWIIIIVLSLETIIGNAMVVMAYRIERNISKQVS
NRYIVSLAISDLIIGIEGFPFFTVYVLNGDRWPLGWVACQTWLFLDYTLCLVSILTVLLI
TADRYLSVCHTAKYLKWQSPTKTQLLIVMSWLLPAIIFGIMIYGWQAMTGQSTSMSGAEC
SAPFLSNPYVNMGMYVAYYWTTLVAMLILYKGIHQAAKNLEKKAKAKERRHIALILSQRL
When taking out the following code I can get the sequence and protein id but not the organism name, how can this be done? :)
from Bio import SeqIO
import re
import pandas as pd
input_file = "Streptomyces_Uniprot.fasta"
pattern = "\|(.*?)\|"
substring = re.search(pattern, s).group(1)
sequence_list = []
id_list = []
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
fasta_id, sequence, description = fasta.id, str(fasta.seq), fasta.description
fasta_id = re.search(pattern, fasta_id).group(1)
print (fasta_id)
How would I pull when OS="ORGANISM_NAME" for example from the description?
Hello biohacker_tobe!
It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/63922667/
This is typically not recommended as it runs the risk of annoying people in both communities.