Question

Error massage when instantiating DEseq Dataset

0

Entering edit mode

4.1 years ago

Jakpa ▴ 50

Hello Everyone, I am trying to perform differential expression. I started by importing the featurecounts data down to removing unwanted columns and performing the Matrix. After running the code for the DESeqDataSet, I got an error massage: "Warning message: In DESeqDataSet(se, design = design, ignoreRank) : all genes have equal values for all samples. will not be able to perform differential analysis"

I a kind of ignored it because I could not really understood the meaning. I went ahead to do sizefactors and i got a value of 1 for all the samples. Below is a view of what i did:

library(DESeq2)
  library(Biobase)
#load the data file from featureCounts.txt
countData<- read.table("/home/mlsi/RNASeq/countTable/featureCounts.txt", head=T,row.names = 1)

#delete column 1-5
deleteColumnCountdata<- countData[-c(1,2,3,4:5)]

colnames(deleteColumnCountdata)
# romove .bam or .sam from filename
colnames(deleteColumnCountdata) <- gsub ("\\X.home.mlsi.RNASeq.mapping.","",colnames(deleteColumnCountdata))  
colnames(deleteColumnCountdata) <- gsub ("\\.UHR_[123].bam","",colnames(deleteColumnCountdata))
colnames(deleteColumnCountdata) <- gsub ("\\.HBR_[123].bam","",colnames(deleteColumnCountdata))
colnames(deleteColumnCountdata)
class(deleteColumnCountdata)
head(deleteColumnCountdata)

# convert 'deleColumnCountdata' to matrix
newCountsData<-as.matrix(deleteColumnCountdata)
head(newCountsData)

group<- factor(c(rep("UHR",3), rep("HBR",3)))
con<- factor(c(rep("cancer",3), rep("normal",3)))

# contruct a data frame
countDataDataFrame<- data.frame(row.names = colnames(newCountsData), group , con)
head(countDataDataFrame)
#instantiate the DESeq dataset
dds<- DESeqDataSetFromMatrix(countData =newCountsData, colData =countDataDataFrame, design = ~ con)

"Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  all genes have equal values for all samples. will not be able to perform differential analysis"

dds<-estimateSizeFactors(dds)
sizeFactors(dds)
HBR_! HBr_2 HBR_3 UHR_1 UHR_2 UHR_3    all the samples have 1

is this correct? because I read that sizefactors are almost 1

then, I went ahead to do Pre-filtering: dds<- dds [rowSums(counts(dds)) > 1, ]

I tried rlogTransformation because I want to have a heatmap/clustering of the data:

rld<- rlogTransformation(dds)

I got this erroe:

Error in estimateDispersionsFit(object, fitType, quiet = TRUE) :
  all gene-wise dispersion estimates are within 2 orders of magnitude
  from the minimum value, and so the standard curve fitting techniques will not work.
  One can instead use the gene-wise estimates as final estimates:
  dds <- estimateDispersionsGeneEst(dds)
  dispersions(dds) <- mcols(dds)$dispGeneEst
  ...then continue with testing using nbinomWaldTest or nbinomLRT
I also tried rlog, log but same error. I tried to follow the suggestion outlined in the error massage, but I dont think I will achieve my goal at the end.
I even tried DESeq(dds);  i got this error: estimating size factors
estimating dispersions
Error in .local(object, ...) :
  all genes have equal values for all samples. will not be able to perform differential analysis

Where am I getting wrong?

I will appreciate solutions and suggestions.

regards, Anthony

differential Expression sizefactors DESeq R • 2.1k views

ADD COMMENT • link updated 4.1 years ago by ATpoint 87k • written 4.1 years ago by Jakpa ▴ 50

0

Entering edit mode

What is the output of head(countData, 10)?

all genes have equal values for all samples

This error suggests that all values (counts) are the same all across the count matrix, this needs to be investigated before going any further.

There is a markdown code option (the 10101 button), please use it from now on to highlight code.

ADD REPLY • link 4.1 years ago by ATpoint 87k

0

Entering edit mode

enter code here

>head(countData) Chr U2 22 FRG1FP 22;22;22;22;22;22;22;22;22 CU104787.1 22;22 BAGE5 22;22 ACTR3BP6 22;22 5_8S_rRNA 22 Start U2 10736171 FRG1FP 10939388;10940597;10941691;10944967;10947304;10949212;10950049;10959067;10961283 CU104787.1 11065974;11067335 BAGE5 11066501;11067985 ACTR3BP6 11124337;11124508 5_8S_rRNA 11249809 End U2 10736283 FRG1FP 10939423;10940707;10941780;10945053;10947418;10949269;10950174;10959136;10961338 CU104787.1 11066015;11067346 BAGE5 11066515;11068089 ACTR3BP6 11124379;11125705 5_8S_rRNA 11249959 Strand Length X.home.mlsi.RNASeq.mapping.HBR_1.HBR_1.bam U2 - 113 0 FRG1FP -;-;-;-;-;-;-;-;- 749 0 CU104787.1 -;- 54 0 BAGE5 +;+ 120 0 ACTR3BP6 +;+ 1241 0 5_8S_rRNA - 151 0 X.home.mlsi.RNASeq.mapping.HBR_2.HBR_2.bam X.home.mlsi.RNASeq.mapping.HBR_3.HBR_3.bam U2 0 0 FRG1FP 0 0 CU104787.1 0 0 BAGE5 0 0 ACTR3BP6 0 0 5_8S_rRNA 0 0 X.home.mlsi.RNASeq.mapping.UHR_1.UHR_1.bam X.home.mlsi.RNASeq.mapping.UHR_2.UHR_2.bam U2 0 0 FRG1FP 0 0 CU104787.1 0 0 BAGE5 0 0 ACTR3BP6 0 0 5_8S_rRNA 0 0 X.home.mlsi.RNASeq.mapping.UHR_3.UHR_3.bam U2 0 FRG1FP 0 CU104787.1 0 BAGE5 0 ACTR3BP6 0 5_8S_rRNA 0enter code here

ADD REPLY • link 4.1 years ago by Jakpa ▴ 50

0

Entering edit mode

@ ATpoint, Thanks for the response. Thats the output of head(countData)

ADD REPLY • link 4.1 years ago by Jakpa ▴ 50

0

Entering edit mode

Looks like featureCounts output. Now you have to find out whether all entries are indeed zero and if so why. Maybe the GTF you used is different from the reference genome you aligned against in terms of chromosome names?

ADD REPLY • link 4.1 years ago by ATpoint 87k

0

Entering edit mode

@ ATpoint, i ran the below command on ubuntu to see the last 10 reads and as you can see all the entries are not zero

enter code here
enter code here

ase) mlsi@ubuntu:~/RNASeq/countTable$ tail featureCounts.txt ERCC-00157 ERCC-00157 1 1019 + 1019 9 9 9 9 9 9 ERCC-00158 ERCC-00158 1 1027 + 1027 0 0 0 0 0 0 ERCC-00160 ERCC-00160 1 743 + 743 7 7 7 7 7 7 ERCC-00162 ERCC-00162 1 523 + 523 26 26 26 26 26 26 ERCC-00163 ERCC-00163 1 543 + 543 4 4 4 4 4 4 ERCC-00164 ERCC-00164 1 1022 + 1022 0 0 0 0 0 0 ERCC-00165 ERCC-00165 1 872 + 872 32 32 32 32 32 32 ERCC-00168 ERCC-00168 1 1024 + 1024 2 2 2 2 2 2 ERCC-00170 ERCC-00170 1 1023 + 1023 1 1 1 1 1 1 ERCC-00171 ERCC-00171 1 505 + 505 864 864 864 864 864 864

ADD REPLY • link 4.1 years ago by Jakpa ▴ 50

0

Entering edit mode

Also, after deleting the unwanted columns and converting to matrix, this the output:

> tail(newCountsData) HBR_1 HBR_2 HBR_3 UHR_1 UHR_2 UHR_3 ERCC-00163 4 4 4 4 4 4 ERCC-00164 0 0 0 0 0 0 ERCC-00165 32 32 32 32 32 32 ERCC-00168 2 2 2 2 2 2 ERCC-00170 1 1 1 1 1 1 ERCC-00171 864 864 864 864 864 864

ADD REPLY • link 4.1 years ago by Jakpa ▴ 50

0

Entering edit mode

If you want to check if there are any genes that don't have a count of 0 across all samples.

library("tidyverse")

countData %>%
  rownames_to_column("gene") %>%
  rowwise %>%
  filter(any(c_across(ends_with(".bam")) != 0))

ADD REPLY • link 4.1 years ago by rpolicastro 13k

0

Entering edit mode

Hi rpolicastro, Thanks for your response. I tried to install the package "tidyverse" but the installation failed. I tried severally it failed to install. I was thinking I have old version of R maybe thats was the reason the installation failed but I have the new version. However, I was able to read the last 10 lines from ubuntu command line as you can see above

ADD REPLY • link 4.1 years ago by Jakpa ▴ 50