I am working with 6 RNA libraries known as KDR1, KDR2, KDR3 wild-type zebrafish embryos, and three mutant embryos let71,let72,let7-3 mi -RNA mutant zebrafish embryos. I am using the DEseq2 pipeline for differential gene expressions. When I am trying to plot the PCA results I am getting only 5 data points. Conditions used are mutant v wild type. The code snippet used is
library(DESeq2)
library(pheatmap)
# library(pheatmap)
# Set working Directory
# setwd(dir)
# specify the name input directory that contains filtered count files
directory <- "E:\\KDR-mir24 16.02.2021"
# specify which files to read in using a list. files
# select those files which contain the string 'filtered.txt' using grep
samplefiles <- grep('filtered.txt', list.files(directory), value=TRUE)
# To keep files in the same order
samplefiles1 <-sort(samplefiles,decreasing = TRUE)
# print the value of the variable to check that it is correct
sample files
# View() function
View(samplefiles)
# use sub to remove 'filtered.txt' from the filenames to obtain the sample names
samplenames <- sub('filtered.txt','',samplefiles)
samplenames
# input the sample conditions manually
sampleconditions <- c('wt','wt','wt','mut','mut','mut')
sampleconditions
# create the sampleTable using the data.frame function
# data frame has three variables (columns): samplename, filename, and condition
sampletable <- data.frame(samplename = samplenames,
filename = samplefiles,
condition = sampleconditions)
View(sampletable)
# build the DESeqDataSet using the following function
dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampletable,
directory = directory,
design = ~ condition)
dds
# set the factor levels using the relevel function, and specifying the reference level
dds$condition <- relevel(dds$condition, ref = 'wt')
# standard differential expression analysis steps are wrapped into a single function, DESeq
dds <- DESeq(dds, betaPrior = TRUE)
dds
res <- results(dds)
res
plotMA(res, ylim=c(-2, 2))
rld <- rlog(dds, blind=FALSE)
plotPCA(rld, intgroup=c("condition")).
Using this option I found the following results.
However, the data points are still five. Do you have any insight on that?
This comment belongs under @ATpoint's comment in answer below.
Please don't use
SUBMIT ANSWER
unless you are adding a new answer for the original question.As I was suspecting, two times identical coordinates. It is now on you to find out why. Check the sample sheet, maybe a name error somewhere, accidentally labelling the same sample with two different names.