Running blastn to get best hits match using (-max_target_seqs 1 -max_hsps 1 ) ?
0
2
Entering edit mode
3.7 years ago
sunnykevin97 ▴ 990

Hi,

I had 43579 ORFs predicted using Transdecoder from one RNA seq dataset, using blastn I blasted them to a nt db using the below cmd.

time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 1 -max_hsps 1 > blastn_3_trans_62_hsps.log

I'm interested in knowing only the best hit, to achieve this I used "-max_target_seqs 1 -max_hsps 1" options is this the correct way of performing blast ? The blastn output produced - 31737 sequences, can I consider these sequences as a blast output ?

 time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR363257.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt 6 -out nucl_blastn_1align_1e-30_1_trans_62_hsps5_normal.txt -evalue 1e-30 -num_threads 30 > blastn_3_trans_62_hsps5_normal.log

The blastn output produced - 4077005, sequences, can I consider these sequences as a blast output ?

time /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/blastn -query /home/sunn/data/softwares/evaluation/TransDecoder-TransDecoder-v5.5.0/SRR3632057.trim_trinity.cdhit.fasta.transdecoder.cds -db /home/sunn/data/softwares/ncbi/ncbi-blast-2.11.0+/bin/nucleotideDBB/nucleotideDBB -outfmt "6 qseqid sseqid sseq" -out nucl_blastn_1align_1e-30_1_trans_62_hsps5.txt -evalue 1e-30 -num_threads 8 -max_target_seqs 5 > blastn_3_trans_62_hsps5.log

The blastn output produced - 712268, sequences, can I consider these sequences as a blast output ?

Totally confused with the generated output. I'm getting more transcripts than the input (43579 ORFs).

Which blast output should I choose as a reference set, is their any paramters/options I need to add to the command line please do let me know.

The blastn output I'd be using as a reference set and performing pairwise/recipocal blast with other non-model species (total 7).

nt database I downloaded from NCBI ftp site, it includes all the nt sequences of all organisms(Euro+Prok). 1 vs. blastdb(nt) -- reference set reference set vs. 2 -- output output vs. 3 -- output .... ... output vs. 7 -- output (final orthologs)

Suggestions please.

blast alignment • 1.4k views
ADD COMMENT
3
Entering edit mode

Take a look at this paper and the original to which this was in reference to to understand the effect of -max_target_seqs parameter. There is a thread with some additional discussion: Misunderstood parameter of NCBI BLAST

You will get a warning from blast if you set the parameter to less than 5 due to possibility of not seeing all equivalent matches.

ADD REPLY
1
Entering edit mode

The blastn output produced - 31737 sequences, can I consider these sequences as a blast output ?

So you had 31737 query sequences which produced one hit based on parameters you used. Are you asking if that result is the truth? If you are trying to annotate a new transcriptome/genome then using a proper tool like prokka (prokaryote) or maker (eukaryote) may be a better option.

ADD REPLY
0
Entering edit mode

I edited the post, please review it. Need suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6