Question

how to output blastx results with full accession

0

Entering edit mode

22 months ago

liyong ▴ 80

Hello All,

I am running blastx with my assembled transcriptome against a local database, which was built from a fasta file. The fasta file contains protein sequences with long accession names. e.g.

>tr|A0A1P8ASE7|A0A1P8ASE7_ARATH Cold-shock protein OS=Arabidopsis thaliana OX=3702 GN=AT1G34049 PE=4 SV=1

After I run blastx with

blastx -db db/prot -query transciptome.fa -out "result.outfmt6" -evalue 1e-20 -outfmt 6 -max_target_seqs 1 -num_threads 64

The result file only contains the abridged accession names. e.g. "evgtrinLocDN2062c1g1t1 A0A1P8ASE7 69.136 81 25 0 597 355 163 243 7.83e-29 109". I want the results contain the full accession names as tr|A0A1P8ASE7|A0A1P8ASE7_ARATH. Is there any setting during the blastx I can change to achieve this?

Many thanks.

blast accession abridged • 1.4k views

ADD COMMENT • link 22 months ago by liyong ▴ 80

0

Entering edit mode

Did you create the database with -parse_seqid option?

ADD REPLY • link 22 months ago by GenoMax 147k

0

Entering edit mode

Yes, I did. Will this affect the accession names?

ADD REPLY • link 22 months ago by liyong ▴ 80

1

Entering edit mode

22 months ago

SequenceServer ▴ 140

If you use the SequenceServer graphical interface to run your BLAST, it provides a standard table output, and also an extended/full one.

enter image description here

If you download that "Full tabular report" from Sequenceserver, the info you're looking for is in the last 4 columns

ADD COMMENT • link 22 months ago by SequenceServer ▴ 140

0

Entering edit mode

Thanks, I will skip this for now. Try to change the setting during blastx to do this.

ADD REPLY • link 22 months ago by liyong ▴ 80

score 2 · Accepted Answer · 2023-01-25

After contacting the ncbi team, I got the solution. Just in case someone else has the same question.

I forgot to mention that I were using version 2.13.0+ of blast+, the default outfmt6 of which is:

qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore

Following ncbi team's suggestion, I replace saccver in the default output with sseqid. Now the result format is following:

evgsoapLoc3t2 tr|A0A1P8ASE7|A0A1P8ASE7_ARATH 30.864 162 111 1 126 611 326 486 7.79e-25 97.8