How to get negative training set if I don't know what constitutes negative?
0
0
Entering edit mode
8.2 years ago
nafizh • 0

I have some bacteria dna sequences as a positive training set by some specific function of theirs. My negative training set would be whatever that does not fall into this category of positive training set. But I do not know in a sure fire way if a sequence falls into this positive category or not. So, how can I get sequences for a negative training data set? Can blast be used in such a way to get completely unrelated sequences to my sequences? Are there any other methods I can use?

machine-learning dna-sequence blast • 1.9k views
ADD COMMENT
1
Entering edit mode

Not clear as to what you mean by "some specific function of theirs"? Would that be a gene coding for something specific or a motif?

You could use synthetically generated sequence that is bound to not be positive.

ADD REPLY
1
Entering edit mode

for instance, by shuffling the dna sequences from the positive training set.

ADD REPLY
0
Entering edit mode

I have a set of experimentally verified sequences from bacteria that produce bacteriocins. That is my positive set. But I have no concrete evidence for what constitutes negative i.e. sequences that do not produce bacteriocins.

ADD REPLY
1
Entering edit mode

You could take sequences from rRNA (16S) or enzymes from glycolysis pathway. They are not likely to have anything to do with bacteriocin production.

ADD REPLY
0
Entering edit mode

But then what happens, if my test set has sequences from different areas than what you mentioned?

ADD REPLY
0
Entering edit mode

I thought that is what you are looking for (sequences that are totally different than your positive set)? Or am I missing something?

ADD REPLY
0
Entering edit mode

Sorry, maybe, I was not clear. My question was, during the testing, what if the test set has negative sequences from different areas than the negative sequences in the training set. Then can the classifier classify between the positive and the negative? Please, let me know, if you want me to clarify something.

ADD REPLY
0
Entering edit mode

Negative sequence should be just that (not related to bacteriocin production). Should not matter what area they come from if the function is all you are interested in, correct?

ADD REPLY
0
Entering edit mode

Yeah, I am only interested in finding out which ones belong to the positive set from new sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6