hi everyone
I have sequenced some filth flies guts using Novoseq and assembled the contigs using megahit. I have assigned taxonomy to the contigs using the NR (which is part of another pipeline, that compares predicted aa to the protein NR) but I have A LOT of contigs that are unclasssifed (ca. 78% average from 30 samples). On one hand this is to be expected as there are no published studies with bacterial (or other) genomes from filth flies guts but on the other hand i was wondering if you have any recommendation for another database that may be able to reduce the nr of unclassified contigs.
Thanks
thanks - you think that will improve it? i can give it a go
Yes. When I perform taxonomic classification of contigs of a genome assembly, I use NCBI nt instead of NCBI nr, because not all contigs contain protein-coding genes. I usually do this to remove contamination.