I am trying to run BLASTP using the command below. It is the blast-2.14.0 version, running on remote server with CPU Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz, 24 cores 48 threads, OS RHEL 8 x86_64.
/home/choh1/ncbi-blast-2.14.0+/bin/blastp \
-query ${MANE_protein_id}_.fasta \
-db /home/choh1/ncbi-blast-2.14.0+/refseq_protein_db/refseq_protein \
-out ${MANE_protein_id}_orthologs.txt \
-seqidlist /home/choh1/ncbi-blast-2.14.0+/refseq_protein_db/primates_acc_alias_blastdb.txt \
-outfmt "6 std staxids scomname ssciname"
I'm running the same command on 2 different directories on 2 different protein_ids, and the expected output is for both output files to have this format, where the last 2 columns should have the scientific name and common name.
NP_001002296.1 XP_047573126.1 100.000 137 0 0 1 137 12 148 1.76e-97 284 9657 Eurasian river otter Lutra lutra \
NP_001002296.1 XP_010956270.1 99.270 137 1 0 1 137 1 137 3.76e-97 282 9837;9838;419612 Bactrian camel Camelus bactrianus
However, I am only getting scientific name and common name from the BLASTP of one directory, and not the other, where they were only N/A in the columns, like below
NP_001034707.1 NP_001034707.1 100.000 354 0 0 1 354 1 354 0.0 698 9595;9606 N/A N/A \
NP_001034707.1 XP_006718705.1 99.718 354 1 0 1 354 1 354 0.0 696 9606 N/A N/A
Does anyone happen to know how to make sure that they always have species names in the last 2 columns please? Thank you in advance!
That is odd.
NP_001034707
appears to be a human protein so you should be able to get the info you are looking for. I assume you have the taxID blast database downloaded and available in$BLASTDB
folder?Hi yes, just to be sure I have the files
in the
/home/choh1/ncbi-blast-2.14.0+/refseq_protein_db/refseq_protein directory
, is that correct?That is correct.