I seem only to be able to find limited ways to look for protein matches in metagenomic or microbiome data via a basic web search form input. I can use NCBI BLASTP against Metagenomic proteins (env_nr) but there is no TBLASTN selection for env DNA (which could join across frameshift breaks on a good day). My current test query, human BACE1, returns some matches but not an entry records count. I cannot find an equivalent search box for the UniProt Metagenomic and Environmental Sequences (UniMES) proteins and clusters, neither is it clear if this data is different to env_nr. I managed to find a TBLASTN option at DDBJ indicating there were 4,663,980 nucleotide sequences in env but using the identical BACE1 query I got no matches at all. According to CAMERA env should be 19,650,359 (implying DDBJ were not updating) but I could not find a workflow query input option. I’m also unclear if, given the new star status of the microbiome, whether these sequence reads, clusters and ORF predictions are going into env or somewhere else (presumably not HAMAP). The only search option seems to be Human Oral Microbiome portal that needs species choice. From looking at the ENA data classes and finding no env I’m still none the wiser. The NCBI deposition guidelines say metagenomic data should be going to SRA but this is not TBLASTN option. If folk can clarify these points I would be grateful.
Thanks, but I did not really want the hassle of big local downloads and installs for occasional usage. Also I still need a global overview of where this type of data is ending up and, for example, find a simple BLAST form for UniMES. I will pop the NCBI help desk though.