Are There Simple Web Forms For Searching Proteins Against Metagenomic And Microbiome Data ?
2
0
Entering edit mode
12.4 years ago
cdsouthan ★ 1.9k

I seem only to be able to find limited ways to look for protein matches in metagenomic or microbiome data via a basic web search form input. I can use NCBI BLASTP against Metagenomic proteins (env_nr) but there is no TBLASTN selection for env DNA (which could join across frameshift breaks on a good day). My current test query, human BACE1, returns some matches but not an entry records count. I cannot find an equivalent search box for the UniProt Metagenomic and Environmental Sequences (UniMES) proteins and clusters, neither is it clear if this data is different to env_nr. I managed to find a TBLASTN option at DDBJ indicating there were 4,663,980 nucleotide sequences in env but using the identical BACE1 query I got no matches at all. According to CAMERA env should be 19,650,359 (implying DDBJ were not updating) but I could not find a workflow query input option. I’m also unclear if, given the new star status of the microbiome, whether these sequence reads, clusters and ORF predictions are going into env or somewhere else (presumably not HAMAP). The only search option seems to be Human Oral Microbiome portal that needs species choice. From looking at the ENA data classes and finding no env I’m still none the wiser. The NCBI deposition guidelines say metagenomic data should be going to SRA but this is not TBLASTN option. If folk can clarify these points I would be grateful.

metagenomics • 2.5k views
ADD COMMENT
1
Entering edit mode
12.4 years ago
cdsouthan ★ 1.9k

I received this useful reply from the NCBI Help Desk:

Hello,

Searches against env_nr are now returning database statistics.

We removed the env_nt database early this year. Select the wgs database and limit by Organism to 'metagenome', or one of the specfic metagenomes.

Best regards,

ADD COMMENT
0
Entering edit mode
12.4 years ago

One simple solution would be to do it locally by downloading the env_nr BLAST database and the BLAST software and then running tblastn with your protein query.

Otherwise, you could ask the NCBI staff for help: blast-help@ncbi.nlm.nih.gov ?

ADD COMMENT
0
Entering edit mode

Thanks, but I did not really want the hassle of big local downloads and installs for occasional usage. Also I still need a global overview of where this type of data is ending up and, for example, find a simple BLAST form for UniMES. I will pop the NCBI help desk though.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6