Standalone BLAST with kingdom classification
3
0
Entering edit mode
10.3 years ago
GeJu ▴ 20

Hi all,

I have an assembled transcriptome from an organism hosting various symbionts and I'm now interested in how many of my contigs hitting to the different kingdoms. What I've found so far is the -window_masker_taxid that filters my dataset based upon what kind of taxis I put there.

​Is there a possibility to do it all in one? To tell blast to add to every hit to which kingdom it belongs and to have then the possibility to extract e.g. all metazoan hits?

And what about sequences that return ambiguous hits e.g. hitting against metazoa and fungi? How do I deal with that ones?

blast • 4.5k views
ADD COMMENT
2
Entering edit mode
10.3 years ago
Neilfws 49k

You can add the taxon ID to tab-delimited BLAST+ output using the -outfmt option, like this:

-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids"

You could then use EUtils/efetch to fetch taxonomic data for each taxon ID in XML format and parse it for the kingdom. I recently did this in Ruby, it looks something like this:

require 'bio'
require 'nokogiri' # XML parsing

Bio::NCBI.default_email = "me@me.com"
xml = Bio::NCBI::REST::EFetch.taxonomy(9606, "xml") # taxon ID 9606 = H. sapiens
doc     = Nokogiri::XML(xml)
names   = doc.xpath("//Taxon/ScientificName").children.map { |child| child.inner_text }
kingdom = names[2]
puts kingdom
# => "Eukaryota"

but you could use whatever implementation of EUtils you like e.g. EDirect.

ADD COMMENT
0
Entering edit mode

cool, I never noticed those custom formats.

ADD REPLY
0
Entering edit mode

Thanks for your suggestions.

But I already stumbled over one problem. I added the taxon ID to the output, tried it on a tiny fraction of my file and only got N/A in the taxon ID column although the sequences have clear hits.

ADD REPLY
1
Entering edit mode
10.3 years ago
5heikki 11k
  1. Get the taxdb and uncompress it where .ncbirc BLASTDB points
  2. Use -outfmt '6 std sskingdoms'
  3. Sort for best hits
  4. Time to grep
  5. You now assigned kingdom level taxonomy based on best-hit criteria
  6. One alternative way would be to feed you blast output to blast2lca
ADD COMMENT
0
Entering edit mode
10.3 years ago

I wrote a recursive XSLT stylesheet reading a BLAST XML format and appending the NCBI taxonomy:

https://github.com/lindenb/xslt-sandbox/blob/master/stylesheets/bio/ncbi/blasttaxonomy.xsl

xsltproc --novalid blasttaxonomy.xsl blastp.xml

Something like:

  <Hit_num>1</Hit_num>
  <Hit_id>gi|302699245|ref|NP_001181875.1|</Hit_id>
  <Hit_def>eukaryotic translation initiation factor 4 gamma 1 isoform 6 [Homo sapiens] >gi|302699247|ref|NP_001181876.1| eukaryotic translation initiation factor 4 gamma 1 isoform 6 [Homo sapiens]</Hit_def>
  <Hit_accession>NP_001181875</Hit_accession>
  <Hit_len>1606</Hit_len>

is replaced by:

          <Hit_num>1</Hit_num>
          <Hit_id>gi|302699245|ref|NP_001181875.1|</Hit_id>
          <Taxon>
            <TaxId>9606</TaxId>
            <ScientificName>Homo sapiens</ScientificName>
            <OtherNames>
              <GenbankCommonName>human</GenbankCommonName>
              <CommonName>man</CommonName>
              <Name>
                <ClassCDE>authority</ClassCDE>
                <DispName>Homo sapiens Linnaeus, 1758</DispName>
              </Name>
            </OtherNames>
            <ParentTaxId>9605</ParentTaxId>
            <Rank>species</Rank>
            <Division>Primates</Division>
            <GeneticCode>
              <GCId>1</GCId>
              <GCName>Standard</GCName>
            </GeneticCode>
            <MitoGeneticCode>
              <MGCId>2</MGCId>
              <MGCName>Vertebrate Mitochondrial</MGCName>

(...)

ADD COMMENT

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6