I have seen some old posts regarding my question but have found that many have expired links. I am performing a biological experiment and need to know what is the reason behind my clustering of PC1 and PC2. I am fairly new to R, DESeq2, and gene expression analysis. This is a snippet of my code:
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
dds
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- DESeq(dds)
vsd <- vst(dds, blind=FALSE)
plotPCA(vsd, intgroup = "condition", ntop = 500, returnData = FALSE)
How can I have R export the top 500 genes in PC1 and PC2? I tried using this but It is just my gene list vsd normalized.
> pca<- assay(vsd, intgroup = "condition", ntop = 500, returnData = FALSE)
> write.csv(as.data.frame(pca), file="PCA_results.csv")
I have seen other posts saying you can make your own function but I'm confused about how I would even change the code to call for my PC1 and PC2, as well as, how would my data be output? Example Post
Thank you, this worked perfectly. I now have a gene list with two columns PC1 and PC2. So both PC1 and PC2 are composed of the same gene list. How do I know which one leans more towards one of the other? For example, gene 20 shows a -0.03 value for PC1 and 8.24 for PC2. Does that mean gene 20 weighs more towards PC2 than PC1?
Generally speaking, yes. The higher the loading value for a gene in a dimension the more that gene contributed to the separation of the samples (aka variance explained). It's also important to look at the relative rank of the gene among all genes in that dimension, and to follow up with further analysis if any of the genes become of interest.
I appreciate your help tremendously!