PCA plotting using DESEq2 pipeline
1
0
Entering edit mode
3.8 years ago
aranyak111 • 0

I am working with 6 RNA libraries known as KDR1, KDR2, KDR3 wild-type zebrafish embryos, and three mutant embryos let71,let72,let7-3 mi -RNA mutant zebrafish embryos. I am using the DEseq2 pipeline for differential gene expressions. When I am trying to plot the PCA results I am getting only 5 data points. Conditions used are mutant v wild type. The code snippet used is

library(DESeq2)

library(pheatmap)

# library(pheatmap)

# Set working Directory
# setwd(dir)

# specify the name input directory that contains filtered count files

directory <-  "E:\\KDR-mir24 16.02.2021"

# specify which files to read in using a list. files 
# select those files which contain the string 'filtered.txt' using grep

samplefiles <- grep('filtered.txt', list.files(directory), value=TRUE)
# To keep files in the same order
samplefiles1 <-sort(samplefiles,decreasing = TRUE)
# print the value of the variable to check that it is correct

sample files

# View() function

View(samplefiles)

# use sub to remove 'filtered.txt' from the filenames to obtain the sample names

samplenames <- sub('filtered.txt','',samplefiles)
samplenames

# input the sample conditions manually

sampleconditions <- c('wt','wt','wt','mut','mut','mut')
sampleconditions

# create the sampleTable using the data.frame function
# data frame has three variables (columns): samplename, filename, and condition


sampletable <- data.frame(samplename = samplenames,
                          filename = samplefiles,
                          condition = sampleconditions)
View(sampletable)

# build the DESeqDataSet using the following function

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampletable,
                                  directory = directory,
                                  design = ~ condition)
dds

# set the factor levels using the relevel function, and specifying the reference level

dds$condition <- relevel(dds$condition, ref = 'wt')

# standard differential expression analysis steps are wrapped into a single function, DESeq

dds <- DESeq(dds, betaPrior = TRUE)

dds

res <- results(dds)

res

plotMA(res, ylim=c(-2, 2))

rld <- rlog(dds, blind=FALSE)

plotPCA(rld, intgroup=c("condition")).

enter image description here

RNA-Seq • 4.4k views
ADD COMMENT
0
Entering edit mode

Using this option I found the following results.

 PC1         PC2 group condition   name
KDR1   -6.698822  2.27079850    wt        wt   KDR1
KDR2   -6.951940 -1.05877286    wt        wt   KDR2
KDR3   -6.951940 -1.05877286    wt        wt   KDR3
let7-1  7.320111  1.14917993   mut       mut let7-1
let7-2  6.131613 -1.36572122   mut       mut let7-2
let7-3  7.150977  0.06328851   mut       mut let7-3.

However, the data points are still five. Do you have any insight on that?

ADD REPLY
0
Entering edit mode

This comment belongs under @ATpoint's comment in answer below.

Please don't use SUBMIT ANSWER unless you are adding a new answer for the original question.

ADD REPLY
0
Entering edit mode
KDR2   -6.951940 -1.05877286    wt        wt   KDR2
KDR3   -6.951940 -1.05877286    wt        wt   KDR3

As I was suspecting, two times identical coordinates. It is now on you to find out why. Check the sample sheet, maybe a name error somewhere, accidentally labelling the same sample with two different names.

ADD REPLY
1
Entering edit mode
3.8 years ago
ATpoint 86k

That gets reported once in a while, usually two points are exactly overlapping and you should check whether you did not accidentally processed the same sample as two indepedent ones. Set returnData=TRUE, then you easily see whether 6 samples are in the PCA and they simply have the same coordinates.

Tip when posting questions: Try to reduce the code to the bare minimum that is relevant for the question. Most of the code you show is not relevant here for this particular issue. People might get a tl;dr feeling.

ADD COMMENT
0
Entering edit mode

also, beware that plotPCA only uses the top 500 most variable genes by default

ADD REPLY
0
Entering edit mode

where should I incorporate the returnData=TRUE in the code block? I have provided the whole code block to give the complete perspective.

ADD REPLY
0
Entering edit mode

plotPCA(rld, intgroup=c("condition"), returnData=TRUE) is what you need.

ADD REPLY

Login before adding your answer.

Traffic: 1002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6