Modifying Standalone BLAST output
2
0
Entering edit mode
7.6 years ago
glady ▴ 320

I am performing a standalone BLAST in ubuntu, I have downloaded the environmental metagenome (env_nt) database from NCBI. The cmd which I am using to perform BLAST is ->

blastn -db env_nt -query file.fasta -out BLAST_output.fasta -max_target_seqs 1 -outfmt '6 qseqid qseq sallseqid stitle score bitscore qcovs evalue pident sacc staxids sscinames scomnames sblastnames'

But in this output I also need the organisms name or the source name of the subject hit which I am obtaining form the BLAST. Can anyone help me regarding this? What syntax should I use to obtain the organism/source name? Thanking you.

blast • 2.3k views
ADD COMMENT
0
Entering edit mode

can you provide several lines of the result?

ADD REPLY
0
Entering edit mode
7.6 years ago

I tried before but failed. These's an indirect way.

ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz provides mapping relationship between accessions (sseqid) and taxid, which you can use to get the organisms name.

dummy data

$ cat t.tsv 
P05876  other-info
P27125  other-columns

get accession

$ cut -f 1 t.tsv > t.acc

get taxid

$ csvtk grep -t -f 1 -P t.acc prot.accession2taxid.gz | cut -f 1,3 | sed 1d > t.acc2taxid

get lineage

$ cat t.acc2taxid |  taxonkit lineage -i 2 > t.acc2taxid.lineage

merge taxid and lineage back to the blast result

$ csvtk join -H -t t.tsv t.acc2taxid.lineage
P05876  other-info      11731   Viruses;Retro-transcribing viruses;Retroviridae;Orthoretrovirinae;Lentivirus;Primate lentivirus group;Simian immunodeficiency virus;Simian immunodeficiency virus - agm;Simian immunodeficiency virus - agm.ver;Simian immunodeficiency virus (TYO-1 ISOLATE)
P27125  other-columns   83333   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli;Escherichia coli K-12

you may need csvtk and taxonkit.

ADD COMMENT
0
Entering edit mode
7.6 years ago
5heikki 11k

You have to setup taxdb but with env_nt I think all the sequences are basically annotated as "Environmental sample" so not much will be gained..

ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6