I am working with KDR-GFP wild type zebrafish embryo data 50 hours post-fertilization RNA seq data. The three variants used are three replicates of the same embryo known as KDR1.KDR2, KDR3 at 50 hours of post-fertilization. When I did differential gene expression using the DESEq2 pipeline the PCA plots show that the first axis explains 100% variance and generates somewhat a bizarre PCA plot. My question is if the samples are exact replicates then such complete variance explained by the first axis of the data set can normally happen? Just to provide more information when I have analyzed three mutant zebrafish replicates known as mir991,mir992,mir993 (mir- indicates microRNA mutants) 1,2,3 indicates the replicates. The PCA plot for them leads to a 98% variance explanation by the first axis and 2% by the second axis.
Would you mind adding some plot and code? it is hard to follow with only text.
98% or 100% variance explained by the first principle component is extremely uncommon for RNA-seq data. Could there be something wrong with the data ? For instance a lot of genes with 0 counts or counts not being integer.