Hi everyone,
I am currently in the process of removing unwanted taxa (Kingdom="Eukaryota", Family="Mitochondria", and Order="Chloroplast") from a phyloseq object I created. This phyloseq object was created using my outputs from DADA2 (OTU table, taxonomy table, and metadata file).
I have saved a taxonomy table in CSV format at every step where I create a new phyloseq object (i.e. removing (1) eukaryotes, (2) mitochondria, and (3) chloroplast -- all unwanted taxa; so 3 new taxonomy tables created) to verify for myself that the correct amount of ASVs was being removed. When I removed the eukaryotes (first step) and printed out the new phyloseq object, the total # of taxa dropped from 2551 to 2506 (2551 - 2506 = 45 ASVs removed). Alternatively, when I use the "ctrl + f" feature in Excel to identify the amount of cells with the value "Eukaryota" in the taxonomy table, only 40 cells/taxa are found. Likewise, when I create a phyloseq object that only contains ASVs that are eukaryotes, only 40 are found.
Why are 45 ASVs removed when only apparently 40 should be? Am I missing something?
I'm very new to this platform, so I've attached my code and outputs (indicated via # sign) below, but do let me know if I can supply something else. I am using R (4.1.1 - Kick Things) and phloseq (version 1.38.0).
original phyloseq object (ps):
ps <- phyloseq(otu_table(st, taxa_are_rows=FALSE), sample_data(samdf), tax_table(taxtab))
ps
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 2551 taxa and 95 samples ]
# sample_data() Sample Data: [ 95 samples by 5 sample variables ]
# tax_table() Taxonomy Table: [ 2551 taxa by 6 taxonomic ranks ]
# refseq() DNAStringSet: [ 2551 reference sequences ]
removing eukaryotes (ps.euk):
ps.euk <- subset_taxa(ps, Kingdom !="Eukaryota")
ps.euk
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 2506 taxa and 95 samples ]
# sample_data() Sample Data: [ 95 samples by 5 sample variables ]
# tax_table() Taxonomy Table: [ 2506 taxa by 6 taxonomic ranks ]
# refseq() DNAStringSet: [ 2506 reference sequences ]
keeping only eukaryotes (ps.euk2):
ps.euk2 <- subset_taxa(ps, Kingdom == "Eukaryota")
ps.euk2
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 40 taxa and 95 samples ]
# sample_data() Sample Data: [ 95 samples by 5 sample variables ]
# tax_table() Taxonomy Table: [ 40 taxa by 6 taxonomic ranks ]
# refseq() DNAStringSet: [ 40 reference sequences ]
image from Excel showing that only 40 cells were found:
Any help would be so appreciate; thanks
-H
It seems there are 5 ASVs, that are not assigned to a kingdom rank, resulting in "NA".