Question

Running blastn to get best hits match using (-max_target_seqs 1 -max_hsps 1 ) ?

2

Entering edit mode

3.7 years ago

sunnykevin97 ▴ 990

Hi,

I had 43579 ORFs predicted using Transdecoder from one RNA seq dataset, using blastn I blasted them to a nt db using the below cmd.

time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 1 -max_hsps 1 > blastn_3_trans_62_hsps.log

I'm interested in knowing only the best hit, to achieve this I used "-max_target_seqs 1 -max_hsps 1" options is this the correct way of performing blast ? The blastn output produced - 31737 sequences, can I consider these sequences as a blast output ?

 time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR363257.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt 6 -out nucl_blastn_1align_1e-30_1_trans_62_hsps5_normal.txt -evalue 1e-30 -num_threads 30 > blastn_3_trans_62_hsps5_normal.log

The blastn output produced - 4077005, sequences, can I consider these sequences as a blast output ?

time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps5.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 5 > blastn_3_trans_62_hsps5.log

The blastn output produced - 712268, sequences, can I consider these sequences as a blast output ?

Totally confused with the generated output. I'm getting more transcripts than the input (43579 ORFs).

Which blast output should I choose as a reference set, is their any paramters/options I need to add to the command line please do let me know.

The blastn output I'd be using as a reference set and performing pairwise/recipocal blast with other non-model species (total 7).

nt database I downloaded from NCBI ftp site, it includes all the nt sequences of all organisms(Euro+Prok). 1 vs. blastdb(nt) -- reference set reference set vs. 2 -- output output vs. 3 -- output .... ... output vs. 7 -- output (final orthologs)

Suggestions please.

blast alignment • 1.4k views

ADD COMMENT • link 3.7 years ago by sunnykevin97 ▴ 990

3

Entering edit mode

Take a look at this paper and the original to which this was in reference to to understand the effect of -max_target_seqs parameter. There is a thread with some additional discussion: Misunderstood parameter of NCBI BLAST

You will get a warning from blast if you set the parameter to less than 5 due to possibility of not seeing all equivalent matches.

ADD REPLY • link 3.7 years ago by GenoMax 147k

1

Entering edit mode

The blastn output produced - 31737 sequences, can I consider these sequences as a blast output ?

So you had 31737 query sequences which produced one hit based on parameters you used. Are you asking if that result is the truth? If you are trying to annotate a new transcriptome/genome then using a proper tool like prokka (prokaryote) or maker (eukaryote) may be a better option.