Entering edit mode
5.1 years ago
the_dummy
▴
40
Hello, I want to train feature classifiers as I did with SILVA and GreenGenes databases. But I couldn't figure out which sequences I should get from NCBI since the database is complex and it is not very straight like SILVA for 16S amplicon analysis. I need to get reference sequences and taxonomy files from NCBI somehow. Any help would be appreciated. Thank you very much...
NCBI has a collection of 16S sequence available as a pre-formatted blast index here. These sequences are from two bioprojects (BioProjects 33175 and 33117), which you can search for at NCBI.
You can recover the fasta sequences from the blast indexes by converting them back to fasta using
blastdbcmd
utility included in BLAST+ package.Yes, thank you for the utility. I did find this data but I thought it is not related since there was no fasta file. GREAT HELP!