Hi, I have a fasta file with more than 38000 protein sequences infered from a genome of Diplonema. All sequences have an ID and an annotation, but the ID is not referenced in any database. I need to check which protein is mitochondrial with the annotations.
Here is an example, with an ID and an annotation:
XXXXX12345 Succinyl-CoA ligase [ADP-forming] subunit beta
I know this one is mitochondrial, because I also used BLAST to check the similarities with the mitochondrial proteins from another organism. But I only know it, because I checked with Google what a "Succinyl-CoA ligase" was amongst the little subset (30 proteins) I found with BLAST.
But is there a way to check programmaticaly each annotations in the fasta file to see if it corresponds to a mitochondrial protein? Which ressource(s) can I use to at least see if proteins are mitochondrial?
Thanks in advance.
Hi,
The only thing that I can think of and you can do, but not sure if is feasible neither the best option, is to build a mitochondrial database, and then map/align all the Diplonema genes/proteins against this database, and the ones that align against it, i.e., higher percent identity and lower e-value, will be assigned/annotated as mitochondrial. I believed there is a human mitochondrial database. Other thing that is possible, but I don't think that will work well, is to assigned/annotate a protein/gene as mitochondrial based on their gene/protein name (though you can have mitochondrial genes without annotation), comparing each gene/protein name against a list of mitochondrial genes/proteins.
António
Hi Antonio, Thank you for your advice! I actually did what you suggest, but with an organism closer in the evolution tree, called Andalucia. I took the proteins that I was sure was mitochondrial, made a database with Blast and use my proteins against it. By doing this, I found proteins like the one I talked about in my post. Thank you again, because it comforts me in what I'm doing!