Question

Check programmaticaly if a protein is mitochondrial with annotations

0

Entering edit mode

4.3 years ago

lumal29 ▴ 80

Hi, I have a fasta file with more than 38000 protein sequences infered from a genome of Diplonema. All sequences have an ID and an annotation, but the ID is not referenced in any database. I need to check which protein is mitochondrial with the annotations.

Here is an example, with an ID and an annotation:

XXXXX12345 Succinyl-CoA ligase [ADP-forming] subunit beta

I know this one is mitochondrial, because I also used BLAST to check the similarities with the mitochondrial proteins from another organism. But I only know it, because I checked with Google what a "Succinyl-CoA ligase" was amongst the little subset (30 proteins) I found with BLAST.

But is there a way to check programmaticaly each annotations in the fasta file to see if it corresponds to a mitochondrial protein? Which ressource(s) can I use to at least see if proteins are mitochondrial?

Thanks in advance.

proteomics GO terms Annotations • 1.5k views

ADD COMMENT • link updated 4.3 years ago by Mensur Dlakic ★ 28k • written 4.3 years ago by lumal29 ▴ 80

0

Entering edit mode

Hi,

The only thing that I can think of and you can do, but not sure if is feasible neither the best option, is to build a mitochondrial database, and then map/align all the Diplonema genes/proteins against this database, and the ones that align against it, i.e., higher percent identity and lower e-value, will be assigned/annotated as mitochondrial. I believed there is a human mitochondrial database. Other thing that is possible, but I don't think that will work well, is to assigned/annotate a protein/gene as mitochondrial based on their gene/protein name (though you can have mitochondrial genes without annotation), comparing each gene/protein name against a list of mitochondrial genes/proteins.

António

ADD REPLY • link 4.3 years ago by antonioggsousa 3.2k

0

Entering edit mode

Hi Antonio, Thank you for your advice! I actually did what you suggest, but with an organism closer in the evolution tree, called Andalucia. I took the proteins that I was sure was mitochondrial, made a database with Blast and use my proteins against it. By doing this, I found proteins like the one I talked about in my post. Thank you again, because it comforts me in what I'm doing!

ADD REPLY • link 4.3 years ago by lumal29 ▴ 80

score 2 · Answer 1 · 2020-08-03

2

Entering edit mode

4.3 years ago

Mensur Dlakic ★ 28k

It is not a perfect solution, but you can try predicting protein localization from sequence. For example:

Most of them should work well with mitochondrial proteins.

ADD COMMENT • link 4.3 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thank you Mensur for your answer. I already used 3 tools to predict the sequences. I used TargetP, Mitofates and PredSL. It's really hard to make a decision based on the results you get from these tools because they don't give the same results. If I look a positive prediction from the 3 tools together, I obtain more than 600 proteins over 38000, and if I look for a positive prediction from at least one tool, I have more than 4000 sequences. How can I decide then which to chose? What I did was taking mitochondrial proteins from another closed organism called Andalucia and check the accuracy of the tools. I had 33 proteins and it predicted 32 of them when I look for a positive prediction from at least one tool. So, as you said, it's not perfect, but it can give me a good idea perhaps. I saw new tools in the links you gave me, I will maybe try some. Thank you again for your help!

ADD REPLY • link 4.3 years ago by lumal29 ▴ 80