I have an abundance matrix for 100s of families (bacterial, viral, etc.) for several samples and was wondering if I could still apply DESeq2 negative binomial distribution analysis to it. The way my data was collected fits all the criteria for negative binomial dist analysis, and DEseq2 essentially identify those significantly differentially expressed genes - but couldn't I run it in the same way to significantly differentially abundant taxa?
My only problem is, DEseq2 requires you to use raw data counts but that makes sense when using RNA data; you should use raw counts to get real results. However, abundance data tends to have a lot of zeros (lots of absences) and thus, it is causing DESeq to fail because it thinks every gene has a zero when in fact, every sample has some taxa absent resulting in a lot of zeros in my data frame.
In cases where many face lots of zeros in their RNA matrix data, pseudogene counts of 1 can be used to alleviate this but in my case, adding 1s would skew the data because a lot of my taxa might be present '1' time. Unless I transform everything entirely to get rid of the zeros, and 1's would be transformed to something else, etc....is this okay? Has anyone attempted something like this before? Or should DESeq2 analysis just stick to RNA data only?