Dear all,
I want to predict representative species from a set of DNA sequencing reads (say 100 bp each).
For this, I am looking for resources/help to perform this task. For instance, what set of models would be ideal to try?
My best accuracies (avg of 0.5) were achieved using binary classification (NB, RF, DT) using for training genomic sequences, trimmed to 100 bp (or other) of several species . Considering species specific Kmers didn't enhance the metrics.
I appreciate any comments/resources/help,
Best and thanks in advance.