Hi, I have a fastQ file obtained with a metagenomic untargeted sequencing. I performed a taxonomic analysis of it with kraken2, and my output looks like this:
The fields of the output, from left-to-right, are as follows:
-Percentage of fragments covered by the clade rooted at this taxon
-Number of fragments covered by the clade rooted at this taxon
-Number of fragments assigned directly to this taxon
-A rank code, indicating (U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. Taxa that are not at any of these 10 ranks have a rank code that is formed by using the rank code of the closest ancestor rank with a number indicating the distance from that rank. E.g., "G2" is a rank code indicating a taxon is between genus and species and the grandparent taxon is at the genus rank.
-NCBI taxonomic ID number
-Indented scientific name
I'd like to understand the nature of all the different species obtained, in particular I'd like to know their source of isolation, if they are pathogen or not, etc...
What is the best way to do that?
Someone suggested me to use EntrezDirect like this:
$ esearch -db biosample -query SAMN10026047 | efetch
1: Corallococcus genome_CA054A
Identifiers: BioSample: SAMN10026047; Sample name: Corallococcus CA054A
Organism: Corallococcus terminator
Attributes:
/strain="CA054A"
/isolation source="soil"
/collection date="2016-09-28"
/geographic location="United Kingdom"
/sample type="Bacterial Isolate"
/identified by="Aberystwyth University"
/type-material="type strain of Corallococcus terminator"
Accession: SAMN10026047 ID: 10026047
But I do not know the accession number of the species, I only have their name and their Taxonomic ID.