Hi,
I am very new in bioinformatics field. I have a BLAST XML output file and I need to parse it to generate a species distribution. I was hoping someone could show me how it can be done by Biopython with some examples.
thank you,
Hi,
I am very new in bioinformatics field. I have a BLAST XML output file and I need to parse it to generate a species distribution. I was hoping someone could show me how it can be done by Biopython with some examples.
thank you,
Sadly the BLAST XML does not (yet) include the taxonomy id as a nice field. Depending on the database used, you may get the species names in the hit descriptions. Other than that, you would need to separately map from the hit gene/protein ID to its species.
If you use the BLAST tabular output, you can get the species, kingdom, taxid, etc as dedicated columns (new in BLAST+ 2.2.28, see http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html for background). This would probably be easiest.
See also: Blobology aka assemblage from the Blaxter Lab in Edinburgh, http://www.nematodes.org/bioinformatics/blobology/index.shtml and https://github.com/blaxterlab/blobology
This is not biopython, but you can simply load XML file in Blast2go tool and generate species distribution.
You can perform the taxonomy assignment, including BLAST, in Qiime. It gives you an OTU table. Then you can even get the taxonomy plots in different form of plots.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
see also
Taxonomy of blast hits
I don't remember: does the taxon-id appear in some BLAST-XML output ? show us a snippet of XML please.