Hi,
I had 43579 ORFs predicted using Transdecoder from one RNA seq dataset, using blastn I blasted them to a nt db using the below cmd.
time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 1 -max_hsps 1 > blastn_3_trans_62_hsps.log
I'm interested in knowing only the best hit, to achieve this I used "-max_target_seqs 1 -max_hsps 1" options is this the correct way of performing blast ? The blastn output produced - 31737 sequences, can I consider these sequences as a blast output ?
time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR363257.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt 6 -out nucl_blastn_1align_1e-30_1_trans_62_hsps5_normal.txt -evalue 1e-30 -num_threads 30 > blastn_3_trans_62_hsps5_normal.log
The blastn output produced - 4077005, sequences, can I consider these sequences as a blast output ?
time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps5.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 5 > blastn_3_trans_62_hsps5.log
The blastn output produced - 712268, sequences, can I consider these sequences as a blast output ?
Totally confused with the generated output. I'm getting more transcripts than the input (43579 ORFs).
Which blast output should I choose as a reference set, is their any paramters/options I need to add to the command line please do let me know.
The blastn output I'd be using as a reference set and performing pairwise/recipocal blast with other non-model species (total 7).
nt database I downloaded from NCBI ftp site, it includes all the nt sequences of all organisms(Euro+Prok). 1 vs. blastdb(nt) -- reference set reference set vs. 2 -- output output vs. 3 -- output .... ... output vs. 7 -- output (final orthologs)
Suggestions please.
Take a look at this paper and the original to which this was in reference to to understand the effect of
-max_target_seqs
parameter. There is a thread with some additional discussion: Misunderstood parameter of NCBI BLASTYou will get a warning from
blast
if you set the parameter to less than 5 due to possibility of not seeing all equivalent matches.So you had 31737 query sequences which produced one
hit
based on parameters you used. Are you asking if that result is the truth? If you are trying to annotate a new transcriptome/genome then using a proper tool likeprokka
(prokaryote) ormaker
(eukaryote) may be a better option.I edited the post, please review it. Need suggestions.