Question

filter OTU table based on absolute abundance

0

Entering edit mode

4.5 years ago

annaA ▴ 10

Hello,

Currently I am working with 16S rRNA - metabarcoding data. I have a question about filtering the OTU table prior to downstream analysis. From my reading I am not sure if I should filter the OTU table based on absolute abundance or relative abundance.

I know its super basic question, but its the first time I am dealing with this kind of data.

Thanks in advance, A.

metabarcoding microbiome OTU • 2.9k views

ADD COMMENT • link updated 4.5 years ago by antonioggsousa 3.2k • written 4.5 years ago by annaA ▴ 10

0

Entering edit mode

Assuming you have e.g. filtered away singleton reads prior to clustering, why do you feel like you need to filter your OTU table at all?

ADD REPLY • link 4.5 years ago by 5heikki 11k

score 0 · Answer 1 · 2020-07-01

Hi,

There are very divergent opinions on the field. I'm far away of being an expert, but my opinion is that you should perform some filtering particularly for beta-diversity analysis.

In my opinion is important to filter out low read counts, that may represent a sequencing artefact, i.e., different sequencing depths obtained for different samples.

Let's say that you have two samples, with "OTU_1001" with 5 reads on Sample A and 0 reads on Sample B. Are you sure that if you sequence more sample B you wouldn't find "OTU_1001" there? Of course this is difficult to tell, but these kind of differences arise even between sample replicates, though it can just represent biological variability.

My point here is that, there is higher uncertainty on these low read count OTUs, and since they have a great impact on distance/dissimilarity based analyses such beta-diversity, people often remove them, let's say OTUs with less than 5 reads in less than x samples. For this case, you should only use absolute read counts and not relative abundance, because 1% in terms of relative abundance can represent a totally different number of reads on two samples with distinct total no. of reads.

Other important notion is normalisation. Different samples have different total no. of reads. Of course that when you transform this data into relative abundance you're mitigating this, although this can represent a statistical challenge, because some stats can not handle compositional data. Therefore, some apply rarefaction, i.e., random sampling all samples to even sampling depth, in order to normalize the data and to keep absolute values (though the data is still compositional in nature: https://academic.oup.com/bioinformatics/article/34/16/2870/4956011 ). Whereas others say that this is inadmissible (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531 ). There are a lot of papers about this issue if you just google it.

I hope this helps,

António