Question

NCBI tblastx remote

0

Entering edit mode

2.6 years ago

ailton • 0

I tried to perform a tbalstx search but I couldn't, I used the following command line:

tblastx -query NC024014.fasta -db nr -remote -outfmt 5 -out teste.table -entrez_query "virus [organism]"

I used a similar command line to run a blastn and it runs normally, but whenever I try to run tblastx I don't even get any result. Thanks in advance for the help

NCBI remote tblastx • 1.9k views

ADD COMMENT • link 2.6 years ago by ailton • 0

GenoMax · Answer 1 · 2022-05-17

0

Entering edit mode

2.6 years ago

Mensur Dlakic ★ 28k

There are at least two things that are wrong with your command. First, tblastx needs a nucleotide query to search against nucleotide database, and you are using nr. Second, virus is not a recognized organism type. Maybe something like this:

tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01

Since your nucleotide sequence contains a single protein, you may want to save some unnecessary translation time. That means to translate your nucleotide sequence and instead search a protein against a nucleotide database:

tblastn -query NC024014.faa -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01

ADD COMMENT • link 2.6 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hello how are you. Thank you for your help.

I tried using this command tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01 but it didn't even get a result. This was the answer whether you have $ tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01 Critical: [tblastx] External MBEDTLS version mismatch: 2.16.2 headers vs. 2.16.3 runtime.

And the following .fasta file was generated

**<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>tblastx</BlastOutput_program>
  <BlastOutput_version>TBLASTX 2.9.0+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>nt</BlastOutput_db>
  <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
  <BlastOutput_query-def>NC_024014</BlastOutput_query-def>
  <BlastOutput_query-len>1714</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>BLOSUM62</Parameters_matrix>
      <Parameters_expect>0.01</Parameters_expect>
      <Parameters_gap-open>11</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>L;</Parameters_filter>
    </parameters>
  </BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>NC_024014</Iteration_query-def>
  <Iteration_query-len>1714</Iteration_query-len>
<Iteration_hits>
</Iteration_hits>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>82489276</Statistics_db-num>
      <Statistics_db-len>753445621923</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>0</Statistics_eff-space>
      <Statistics_kappa>0</Statistics_kappa>
      <Statistics_lambda>0</Statistics_lambda>
      <Statistics_entropy>0</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
  <Iteration_message>internal_error: (Severe Error) Blast search error: Details: search failed. # Informational Message: [blastsrv4.REAL]: Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). No hits found</Iteration_message>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>**

When I performed the blastn remote as the following command $ blastn -query sequence.fasta -db nt -task blastn -remote -entrez_query "nematode [organism]" -outfmt 5 -out teste.table -max_target_seqs 6 against the nematode database to obtain excellent results. More my interest is to perform the Tblastx against a database of viruses.

ADD REPLY • link updated 2.6 years ago by GenoMax 148k • written 2.6 years ago by ailton • 0

0

Entering edit mode

I also tried doing Tblastx locally on my machine with the following command

$ makeblastdb -in reference.fasta -title reference -dbtype nucl -out databases/reference

Building a new DB, current time: 13/05/2022 14:34:29
New DB name: /home/ailton_ubuntu/bioinformatica/ncbi-blast-2.13.0+/bin/databases/reference
New DB title: reference
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/ailton_ubuntu/bioinformatica/ncbi-blast-2.13.0+/bin/databases/reference
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 9843 sequences in 0.357577 seconds.
$ tblastx -db databases/reference -query sequences.fasta -evalue 1e-9 -word_size 11 -outfmt 5 > sequences.reference
BLAST query/options error: Word-size must be less than 6 for protein comparison
Please refer to the BLAST+ user manual.

$ tblastx -db databases/reference -query sequences.fasta -evalue 1e-9 -word_size 5 -outfmt 5 > sequences.reference

And it also keeps giving error, in fact it doesn't even appear a file, and always the "killer" message, and my machine has good settings, like 16 ram and an intel core i7-10750H.

ADD REPLY • link updated 2.6 years ago by GenoMax 148k • written 2.6 years ago by ailton • 0

0

Entering edit mode

And it also keeps giving error, in fact it doesn't even appear a file, and always the "killer" message, and my machine has good settings, like 16 ram and an intel core i7-10750H.

If you mean 16 Gb RAM, that is typically not enough for BLAST with a large database. Those killed messages are likely telling you the same thing: you don't have enough RAM for a local database search. You may need to reduce the database size or do the remote search without double translation, meaning using tblastn instead of tblastx.

ADD REPLY • link 2.6 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

tblastx is an intensive search. If you are trying to do this against entire set of viral sequences then it is likely not going to work on public NCBI site as indicated by error message you received.

Error: CPU usage limit was exceeded, resulting in SIGXCPU (24).

You should try limiting your entrez_search to a more limited set of viruses. That will likely work better.

ADD REPLY • link 2.6 years ago by GenoMax 148k

0

Entering edit mode

but I need to perform Tblastx with my results, as I am working with the "discovery" of new viral genomes, and this tool would help a lot in my work. If I restrict my search to certain virus groups, my blast search would be unfeasible. so I would like to do my Tblastx query locally or remotely, but I can't perform even one of these tblastx searches, I don't know if I'm doing some wrong command, because I don't understand much about bioinformatics yet because I'm starting in this area for a short time.

ADD REPLY • link 2.6 years ago by ailton • 0

0

Entering edit mode

The error message is clear:

Error: CPU usage limit was exceeded, resulting in SIGXCPU (24).

As GenoMax already explained, and I referred to indirectly above: tblastx is a slow and expensive search, so it is somewhat expected that the CPU limit may be exceeded. As I told you already: a simple translation of your nucleotide sequence to protein and using tblastn instead of tblastx will allow you to complete your search, because it cuts down significantly on search time. I just did it remotely from my computer and it worked. You can still search against all the viruses without any limit.

tblastn -query NC024014.faa -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01

Presumably you know how to translate the sequence. Just in case, this is what needs to be in file NC024014.faa:

>YP_009026407.1 RNA dependent RNA polymerase [Arhar cryptic virus-I]
MDHRWRGATRGLIRLEEIPTRRIRDERRILIDEYASEAINRYVPLHLRAELEGWARSYYTLETHLNAIMN
YDRPKLSQPSDAAWVSTMHHVREQFRQMDKVTALSHYHLDKVKWVRSSAAGYGYVGLKSDPGNYERARTT
AFTIAERLNHERDYAPEALKNSTPDVAFTRTQLCQIKIKRKVRNVWGEAFHYVLLEGLFADPLIQHFMKI
DSFYFIGQDPLLAVPYLIEDILSESDYVYMFDWSGFDSSVHEWEIRFAFELLESLLVFPSSVEQHVWRFI
IELFIYRKIASPNGVMYLKTQGIPSGSCFTNIIGSITNYVRIQYIFRRLTNRFANVFTHGDDSLAGVSAV
QFIPMENIAQVCAEFNWTINVDKSDVSRIAEAVTFLSRNVREMSHARDELTCLRMLKYPEYPVESGAVST
LRALSISKDAGLNSHYLYKIYKFLDIKYGKADSLPLHHKSWDPLEYESLRLPYSQ

ADD REPLY • link 2.6 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thanks a lot for the help. I managed to run it using its parameters, I also used other methods to run tblasx $ tblastx -db databases/reference -query sequences.fasta -evalue 0.0000000001 -outfmt 5 -max_target_seqs 1 -out fungi.table -num_threads 10 which made it possible to run locally. and using the following command to run remotely $ blastx -db nr -query Q -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 -remote. For tblastx search restricting works from smaller taxonomic groups: $ tblastx -db nt -query Q -remote -entrez_query "Monodnaviria[orgn]" -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 A larger group will take to a CPU time error: $ tblastx -db nt -query Q -remote -entrez_query "viruses[orgn] NOT riboviris[orgn]" -evalue 0.001 -outfmt "7 std staxid ssciname" -max_target_seqs 10. You should refer to the following two pages and set your limit based on the number of sequences available, further split the list if the number of sequences is above a certain size, like the third one: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=10239&lvl=1&p=core https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2559587&lvl=1&p=core https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2732396&lvl=1&p=core

ADD REPLY • link 2.6 years ago by ailton • 0