Question

Conflict of abundances on different taxonomic levels using centrifuge

0

Entering edit mode

6.4 years ago

grant.hovhannisyan ★ 2.6k

I have posted this in Centrifuge github, but it doesn't to be seem very responsive.

I want to analyze microbial abundances on genus level using centrifuge. However I see that genus level abundances (and read counts) are less than abundances of species in the corresponding genus, which seems a bit counter intuitive.

For example, if I do grep Homo report.tsv, I get

Homo 9605 genus 0 12 0 0.0
Homo sapiens 9606 species 3238442024 738484 395792 1.26543e-06

Shouln't the software summarize abundances in hierarchical manner? Please let me know if I miss something.

Cheers,

metagenomics centrifuge • 1.7k views

ADD COMMENT • link updated 6.4 years ago by Rob 6.9k • written 6.4 years ago by grant.hovhannisyan ★ 2.6k

score 2 · Answer 1 · 2018-08-09

2

Entering edit mode

6.4 years ago

Rob 6.9k

I think you're misunderstanding how the assignment works. Centrifuge attempts to assign each read to the _most specific_ taxon possible. Only if the read cannot be assigned to a taxon unambiguously does it assign the read further up the tree. Your output is showing that you have many reads assigned to Homo sapiens at the species level, and 12 reads assigned to Homo at the genus level. That means these reads may have been compatible with H. sapiens, but matched equally well with one or more other species within the Homo genus. The counts up the taxonomy are not cumulative --- if you wanted to know all reads coming from Homo at the genus level or lower, you would sum the counts at the genus itself (node 9605) with all of the counts on nodes that are descendants of Homo.

ADD COMMENT • link 6.4 years ago by Rob 6.9k

0

Entering edit mode

Thanks for clarifying! This is actually exactly how I was interpreting the assignments until I have seen a Kraken file format, where everything is summarize in hierarchical manner, and that's why I thought that most probably it should be the case here as well.

you would sum the counts at the genus itself (node 9605) with all of the counts on nodes that are descendants of Homo - is there a way to do it in centrifuge (or other tool), or I have to do it manually? Thanks

ADD REPLY • link 6.4 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

According to the centrifuge documentation, you can create a kraken-style report from the centrifuge report.

ADD REPLY • link 6.4 years ago by Rob 6.9k