I have been using DADA2 with the Silva and PR2 databases. Both have a training set of data that can be used to assign taxonomy with assignTaxonomy command.
taxa <- assignTaxonomy(seqtab.nochim, "silva_nr_v132_train_set.fa.gz")
I wish to use the EukRibo database (https://zenodo.org/records/6896896) but unfortunately the formatting is not the same.
For example, in Silva the formatting of the species assignement file and training file are:
Training file show taxonomical ranks and sequence:
zcat silva_nr_v132_train_set.fa.gz | head -3
>Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Candidatus_Regiella;
TTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCG
Species alignment file show ID Genus Species:
zcat silva_species_assignment_v132.fa.gz | head -3
>AC201869.46386.47908 Regiella insecticola
AGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGCAGCGGGGAGTAGCTTGCTACTCTGC
However, in EukRibo the fomatting of the full sequences file is:
zcat 46346_EukRibo-02_full_seqs_2022-07-22.fas.gz | head -3
>AB000271 Eukaryota|Diaphoretickes|Sar|Alveolata|Myzozoa|AC-clade|Apicomplexa|CM-group|coccidiomorphea|hematozoans|HP-clade|Piroplasmorida|Theileridae|g:Theileria|Theileria+sergenti
AACCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTTAAAGATTAAGCCATGCATGTCT
Which seem to be ID and taxonomical rank with different formatting.
Has anyone used EukRibo with DADA2? Is there any way to convert this database for DADA2?
So using this, you were able to assign taxonomy to sequences using the reformatted eukribo database? I'm looking to do the same thing but have little experience in this area
It did work, but the results showed mismatched taxonomic ranks (as shown above), and the species were not matching at all what we were expecting. So I personally gave up analysis with EukRibo and use PR2 for Eukaryotes identification.