Hi everyone.
I just had a thought that one could try making a really simple classifier that is trained on a table of alleles for a bunch of bacterial isolates with known disease/carriage state and then uses that to predict disease state for a test set of isolates.
By looking at the most important features of the model you could see genes which most strongly discriminate between carriage and disease state, thereby forming a list of potential virulence associated genes.
The idea feels really very simple to me and I can't find a paper talking about it which has me thinking it's either vastly more complex than that, or simply not very effective/better methods exist so I'd like to hear input from anyone here about this idea.
If this is a reasonable idea I was also thinking you could do the same with intergenic regions to find igrs with mutations associated with disease/carriage.
I suppose this would be somewhat like a gwas and people just do that instead? Not sure.