Hi,
I wonder if anyone can help. I am trying to analyse my NGS tag sequencing data. We have a list of aligned genes with read counts, read counts per million for two different cell types (lets call them X and Y) and the fold changes in gene expression between them. We are mainly interested in one of the cell types (X) more than the other in that we want to know which genes are enriched in X as compared to Y.
However, the data we get back from the GeneProf programme has a considerable number of zeros for tag counts for aligned genes in both X and Y cell types. Obviously, GeneProf can't compute the fold change when either of the tag counts has a zero which means I have a large amount of blank fold changes. Is there a standard way of dealing with this? If I set all the zeros to a constant of 1, will this skew some of my fold changes? Alternatively, I thought I could set all the zeros to a constant of a really small number such as 0.0000000001 or something, but then the fold changes I get will be massive numbers so not sure if that is any good either?
Can anyone help?
Thank you for your reply Istvan,
I was under the impression that statistics could not be done if you only have one replicate for each cell type/condition etc? I was hoping to take the fold changes and do Kernel Density plots to look at the distribution of fold changes to try and identify a fold change that is "significant" compared to the rest as I understand that DESeq and Cuffdiff are meaningless without technical/biological replicates.
However, if you have any other strategy for this analysis then I'm very open to suggestions!
Ok that is now a different issue altogether - the first is what to do with zero counts, the second question is what to do if I don't have replicates. That is unrelated to the first, and it actually questions the data even more. If you don't have replicates the reliability of fold change is even worse as your errors will add up quadratically and you don't have any way to mitigate that.
The reason so many tools don't work without replicates is that one cannot infer anything useful without them.
The one potential way to use data with no replicates is to validate hypotheses derived by other means. So instead of this data being the driver for hypothesis discovery it becomes the means of validating a hypothesis derived in a different way.