I used a similar command line to run a blastn and it runs normally, but whenever I try to run tblastx I don't even get any result. Thanks in advance for the help
There are at least two things that are wrong with your command. First, tblastx needs a nucleotide query to search against nucleotide database, and you are using nr. Second, virus is not a recognized organism type. Maybe something like this:
Since your nucleotide sequence contains a single protein, you may want to save some unnecessary translation time. That means to translate your nucleotide sequence and instead search a protein against a nucleotide database:
I tried using this command tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01 but it didn't even get a result. This was the answer whether you have $ tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01
Critical: [tblastx] External MBEDTLS version mismatch: 2.16.2 headers vs. 2.16.3 runtime.
And the following .fasta file was generated
**<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>tblastx</BlastOutput_program>
<BlastOutput_version>TBLASTX 2.9.0+</BlastOutput_version>
<BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nt</BlastOutput_db>
<BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
<BlastOutput_query-def>NC_024014</BlastOutput_query-def>
<BlastOutput_query-len>1714</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>0.01</Parameters_expect>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>L;</Parameters_filter>
</parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>NC_024014</Iteration_query-def>
<Iteration_query-len>1714</Iteration_query-len>
<Iteration_hits>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>82489276</Statistics_db-num>
<Statistics_db-len>753445621923</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0</Statistics_kappa>
<Statistics_lambda>0</Statistics_lambda>
<Statistics_entropy>0</Statistics_entropy>
</Statistics>
</Iteration_stat>
<Iteration_message>internal_error: (Severe Error) Blast search error: Details: search failed. # Informational Message: [blastsrv4.REAL]: Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). No hits found</Iteration_message>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>**
When I performed the blastn remote as the following command $ blastn -query sequence.fasta -db nt -task blastn -remote -entrez_query "nematode [organism]" -outfmt 5 -out teste.table -max_target_seqs 6 against the nematode database to obtain excellent results. More my interest is to perform the Tblastx against a database of viruses.
Building a new DB, current time: 13/05/2022 14:34:29
New DB name: /home/ailton_ubuntu/bioinformatica/ncbi-blast-2.13.0+/bin/databases/reference
New DB title: reference
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/ailton_ubuntu/bioinformatica/ncbi-blast-2.13.0+/bin/databases/reference
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 9843 sequences in 0.357577 seconds.
$ tblastx -db databases/reference -query sequences.fasta -evalue 1e-9 -word_size 11 -outfmt 5 > sequences.reference
BLAST query/options error: Word-size must be less than 6 for protein comparison
Please refer to the BLAST+ user manual.
And it also keeps giving error, in fact it doesn't even appear a file, and always the "killer" message, and my machine has good settings, like 16 ram and an intel core i7-10750H.
And it also keeps giving error, in fact it doesn't even appear a file, and always the "killer" message, and my machine has good settings, like 16 ram and an intel core i7-10750H.
If you mean 16 Gb RAM, that is typically not enough for BLAST with a large database. Those killed messages are likely telling you the same thing: you don't have enough RAM for a local database search. You may need to reduce the database size or do the remote search without double translation, meaning using tblastn instead of tblastx.
tblastx is an intensive search. If you are trying to do this against entire set of viral sequences then it is likely not going to work on public NCBI site as indicated by error message you received.
Error: CPU usage limit was exceeded, resulting in SIGXCPU (24).
You should try limiting your entrez_search to a more limited set of viruses. That will likely work better.
but I need to perform Tblastx with my results, as I am working with the "discovery" of new viral genomes, and this tool would help a lot in my work. If I restrict my search to certain virus groups, my blast search would be unfeasible.
so I would like to do my Tblastx query locally or remotely, but I can't perform even one of these tblastx searches, I don't know if I'm doing some wrong command, because I don't understand much about bioinformatics yet because I'm starting in this area for a short time.
Error: CPU usage limit was exceeded, resulting in SIGXCPU (24).
As GenoMax already explained, and I referred to indirectly above: tblastx is a slow and expensive search, so it is somewhat expected that the CPU limit may be exceeded. As I told you already: a simple translation of your nucleotide sequence to protein and using tblastn instead of tblastx will allow you to complete your search, because it cuts down significantly on search time. I just did it remotely from my computer and it worked. You can still search against all the viruses without any limit.
Thanks a lot for the help. I managed to run it using its parameters, I also used other methods to run tblasx $ tblastx -db databases/reference -query sequences.fasta -evalue 0.0000000001 -outfmt 5 -max_target_seqs 1 -out fungi.table -num_threads 10 which made it possible to run locally. and using the following command to run remotely $ blastx -db nr -query Q -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 -remote. For tblastx search restricting works from smaller taxonomic groups: $ tblastx -db nt -query Q -remote -entrez_query "Monodnaviria[orgn]" -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 A larger group will take to a CPU time error: $ tblastx -db nt -query Q -remote -entrez_query "viruses[orgn] NOT riboviris[orgn]" -evalue 0.001 -outfmt "7 std staxid ssciname" -max_target_seqs 10.
You should refer to the following two pages and set your limit based on the number of sequences available, further split the list if the number of sequences is above a certain size, like the third one:
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=10239&lvl=1&p=core
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2559587&lvl=1&p=core
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2732396&lvl=1&p=core
Hello how are you. Thank you for your help.
I tried using this command tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01 but it didn't even get a result. This was the answer whether you have $ tblastx -query NC024014.fasta -db nt -remote -outfmt 5 -out teste.table -entrez_query "Viruses [organism]" -evalue 0.01 Critical: [tblastx] External MBEDTLS version mismatch: 2.16.2 headers vs. 2.16.3 runtime.
And the following .fasta file was generated
When I performed the blastn remote as the following command $ blastn -query sequence.fasta -db nt -task blastn -remote -entrez_query "nematode [organism]" -outfmt 5 -out teste.table -max_target_seqs 6 against the nematode database to obtain excellent results. More my interest is to perform the Tblastx against a database of viruses.
I also tried doing Tblastx locally on my machine with the following command
$ makeblastdb -in reference.fasta -title reference -dbtype nucl -out databases/reference
$ tblastx -db databases/reference -query sequences.fasta -evalue 1e-9 -word_size 5 -outfmt 5 > sequences.reference
And it also keeps giving error, in fact it doesn't even appear a file, and always the "killer" message, and my machine has good settings, like 16 ram and an intel core i7-10750H.
If you mean 16 Gb RAM, that is typically not enough for BLAST with a large database. Those
killed
messages are likely telling you the same thing: you don't have enough RAM for a local database search. You may need to reduce the database size or do the remote search without double translation, meaning usingtblastn
instead oftblastx
.tblastx
is an intensive search. If you are trying to do this against entire set of viral sequences then it is likely not going to work on public NCBI site as indicated by error message you received.You should try limiting your
entrez_search
to a more limited set of viruses. That will likely work better.but I need to perform Tblastx with my results, as I am working with the "discovery" of new viral genomes, and this tool would help a lot in my work. If I restrict my search to certain virus groups, my blast search would be unfeasible. so I would like to do my Tblastx query locally or remotely, but I can't perform even one of these tblastx searches, I don't know if I'm doing some wrong command, because I don't understand much about bioinformatics yet because I'm starting in this area for a short time.
The error message is clear:
As GenoMax already explained, and I referred to indirectly above:
tblastx
is a slow and expensive search, so it is somewhat expected that the CPU limit may be exceeded. As I told you already: a simple translation of your nucleotide sequence to protein and usingtblastn
instead oftblastx
will allow you to complete your search, because it cuts down significantly on search time. I just did it remotely from my computer and it worked. You can still search against all the viruses without any limit.Presumably you know how to translate the sequence. Just in case, this is what needs to be in file
NC024014.faa
:Thanks a lot for the help. I managed to run it using its parameters, I also used other methods to run tblasx $ tblastx -db databases/reference -query sequences.fasta -evalue 0.0000000001 -outfmt 5 -max_target_seqs 1 -out fungi.table -num_threads 10 which made it possible to run locally. and using the following command to run remotely $ blastx -db nr -query Q -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 -remote. For tblastx search restricting works from smaller taxonomic groups: $ tblastx -db nt -query Q -remote -entrez_query "Monodnaviria[orgn]" -evalue 0.0001 -outfmt "7 std staxid ssciname" -max_target_seqs 10 A larger group will take to a CPU time error: $ tblastx -db nt -query Q -remote -entrez_query "viruses[orgn] NOT riboviris[orgn]" -evalue 0.001 -outfmt "7 std staxid ssciname" -max_target_seqs 10. You should refer to the following two pages and set your limit based on the number of sequences available, further split the list if the number of sequences is above a certain size, like the third one: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=10239&lvl=1&p=core https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2559587&lvl=1&p=core https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? id=2732396&lvl=1&p=core