Hello, I am attempting to understand Figure 2 , A.... in this study, mbio.asm.org/content/6/4/e00749-15.abstract
I am trying to understand what the principle components represent in relation to the large amount of RNA seq data. Each gene has variation relative to the normalized PA01 strain. The data points include a large set of 150 transcriptomes of clinical isolates and 50 of the lab strain in different environmental conditions.
So how is an entire transcription map made into a single point and what are the 3 primary components representing?
Any help would be greatly appreciated. Thanks!!!!
@Presistent Labs gives a good explanation of the PCA procedure, but doesn't quite describe what these PC's are representing...
The entire transcripome is not represented by one point... It is being represented by 3 (PC1, PC2, and PC3). And in fact it is truly represented by (# of strains) - 1 points . However we cannot plot in 200 dimensional space (see @Persistent Labs answer below), so we use the three points that shows the most difference between the 202 strains.
Strictly speaking the first 3 PC's represent "hidden/psuedo" variables that account for the most variation between the strains. In a biological sense, what I just said is useless... To get an idea of what biological phenomena is we'd have to look at the genes are expressed across each PC. Each PC likely represents a class(es) of genes being co-expressed/co-regulated. Ideally, this would be visibile based on where the strain came from (i.e. clinical, environmental condition, etc...)
I'm just looking for a ballpark explanation. I was told it might represent protein clusters....??