Proteins that cannot form biofilm?
1
0
Entering edit mode
7.0 years ago
nafizh • 0

I am trying to build a machine learning training set for bacterial protein sequences that form biofilm, and that cannot. I collected the positive sequences from the GO ontology website but for negative sequences I am not sure which sequences to incorporate into my training set.

Can anyone point me to resources for proteins sequences that are known to be not capable of forming biofilms?

gene protein bacteria microbiology • 1.3k views
ADD COMMENT
0
Entering edit mode

What do you mean by proteins that form biofilms? Are you trying to find out what the major protein components of a biofilm are? Because bacteria, not proteins, form biofilms.

ADD REPLY
0
Entering edit mode

Essentially, yes, I am trying to detect proteins that are indispensable in forming biofilms. So, I need a negative set of protein sequences which definitely don't have that function.

ADD REPLY
0
Entering edit mode

I think you need to be very careful how you define 'indispensable'. dnaA for example, is obviously not a biofilm producing gene, but if you lacked the gene, you wouldn't get a biofilm, because the organism would be non-viable (as it's a required housekeeping gene).

If it's sufficient that they don't have a primary functional role, then you could use standard, so-called housekeeping genes as negatives. These are easy to find in the literature as they're commonly used for negative controls in RT-PCR experiments. e.g. dnaA, gyrB, rpoA etc.

ADD REPLY
0
Entering edit mode
7.0 years ago

I cannot answer about the best GO terms to use, and do not know how consistently they are applied to proteins that actually form biofilm.

However, from the UniProt point of view, I'd like to alert you to fact that negative queries should be used with extreme caution: Indeed, the absence of an annotation does not mean absence of a function (a true negative). Lack of annotation may simply be due to false negatives: incompleteness either in the state of experiment-derived knowledge of a particular protein's function, or incompleteness in representing that knowledge as annotations, i.e. an entry may not be up-to-date and therefore does not have the positive annotation (yet).

See http://www.uniprot.org/help/negative_datasets

ADD COMMENT

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6