Extract compounds/proteins from genomic data
2
0
Entering edit mode
17 months ago
Sorin • 0

Hello,

First of all, sorry for this noobish question and secondly, I posted the same question on StackExchange but I don't think it will get any answer there (low traffic, the question got closed as "not bioinformatics").

I'm wondering, if, by bioinformatic means only, is possible to "extract" genes and, after that proteins and natural compounds from plant genome data that is available on NCBI (please see https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_029618835.1/ as an example)? Example: just by processing the data from the link above we can get to Flavoxanthin or any other substance.

What other useful information from a chemical compound perspective can be found from digging into the genome itself?

Any hints will be greatly appreciated.

Thank you!

plants annotations • 1.5k views
ADD COMMENT
1
Entering edit mode

Not sure if this answers your question, but have you tried looking at the taxonomy browser for this species - https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=41496. Seems like there are some annotated genes and proteins that you could find here. But I agree with the other comment that the assembly seems premature at this stage with no assembled chromosomes

ADD REPLY
0
Entering edit mode

Yes, it helps, thank you. But I was wondering of a kind of "de-novo" annotation can be made (if that's the correct term). Please see my reply to the other comment. Thank you!

ADD REPLY
1
Entering edit mode

By no stretch of the imagination am I an expert in annotating "compounds", but what you say in the other comment about "guessing" what plant can contain or not - I am guessing (as GenoMax says) this will majorly revolve around guessing genes and their protein products. So if your question is whether, given a stretch of the genome, can you guess how many and which genes are present and what are their protein products then the answer, to the best of my knowledge, would be - kind of. I would refer you to this review which enlists these methods. With this, you could guess a stretch of DNA could code for X protein, but this guess will always come with some bit of uncertainty.

ADD REPLY
0
Entering edit mode

The link you sent is a gold mine for me, thank you for it manaswwm :D

ADD REPLY
1
Entering edit mode
17 months ago
GenoMax 147k

It is not clear what exactly you wish to achieve. The sequence assembly you have linked to is for Calendula officinalis and is far from complete since it looks like it has 47000+ contigs. As a result there do not seem to be any gene models/annotations present. You could use what data there is to see if you can identify genes and then work your way upwards to getting the protein sequences and work on identifying what those proteins are etc. This will be a lot of work and probably beyond what one person can do in a reasonable amount of time.

But if you are interested in pressing on then you could certainly get incremental information as you start working on the data. But going from DNA sequence to a "chemical compound" would not be straight forward.

ADD COMMENT
0
Entering edit mode

GenoMax A little more background: I started studying bioinformatics and I enjoy it a lot (I'm a software developer with around 15 years of experience but mostly in backend, scripting, architecture, and security) and I want to do, as a project, a database regarding natural compounds. When I saw that on NCBI there are a lot of plant genomes I thought, that, with software, somehow we can "guess" (probabilities, not certainties) what a plant can contain or not.

ADD REPLY
0
Entering edit mode
13 months ago
Mark ★ 1.6k

What you are describing is called genome mining, in the bacterial space there's antismash: https://academic.oup.com/nar/article/49/W1/W29/6274535?login=true

It turns out Antismash now supports plants! http://plantismash.secondarymetabolites.org/

ADD COMMENT

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6