NA values of count matrix in class DESeqDataSet
2
0
Entering edit mode
7.2 years ago
maleknias ▴ 40

Dear all

Hi

I downloaded a data set in class "RangedSummarizedExperiment" from "https://jhubiostatistics.shinyapps.io/recount/". I want to find differential expression genes. My code is :

load("~/Downloads/rse_gene.Rdata")

class(rse_gene)

**[1] "RangedSummarizedExperiment"

attr(,"package")

[1] "SummarizedExperiment"**

data=colData(rse_gene)

names= names(colData(rse_gene))

write.table(data,file="colData.csv", col.names=names,sep="\t",row.names=FALSE)

data1=fread("~/Downloads/colData.txt")

colData(rse_gene) =DataFrame(data1)

colData(rse_gene)$disease.status = as.factor(colData(rse_gene)$disease.status)

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

**converting counts to integer mode Error in validObject(.Object) :

invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix In addition: Warning message: In mde(x) : NAs introduced by coercion to integer range**

I use two solution for this problem but both of them were useless:

1- Keep only rows with non-zero counts:

rse_gene <- rse_gene[rowSums(assay(rse_gene)) != 0, ]

2- Replace the NA value by -9 :

countdata <- assay(rse_gene)

replace(countdata,countdata==0,-9)

coldata <- colData(rse_gene)

ddsMat <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ disease.status)

I will be appreciate if any one can help me!!

RNA-Seq recount • 11k views
ADD COMMENT
0
Entering edit mode

Try this:

library(DESeq2)
load("rse_gene.Rdata")
ddsse=DESeqDataSet(rse_gene,design=~disease.status)

Btw, what is the accession number of the data?

ADD REPLY
0
Entering edit mode
7.2 years ago

Your command using rowSums won't work. What I normally do is something like this:

test <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
colnames(test) <- c("a","b","c","d","e")
test
a b  c  d e
1 1  1 NA 1
2 2 NA NA 2
3 3  3 NA 3
4 4  4  4 4
5 5  5  5 5

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]
a b c d e
4 4 4 4 4
5 5 5 5 5

I originally used this way of filtering for removing transcripts that had zero counts across 5 or more samples, using something like this:

apply(test, 1, function(x) sum(x==0))<5
ADD COMMENT
1
Entering edit mode

Instead of

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]

you could use the built-in function complete.cases, e.g

test[complete.cases(test), ]
ADD REPLY
0
Entering edit mode

Thanks e.rempel - I knew that the function existed but could not remember the name at the time of writing!

ADD REPLY
0
Entering edit mode
7.2 years ago
maleknias ▴ 40

Dear all

I used all your guide but unfortunately all of them have the last same error !!

first:

assay(rse_gene)=assay(rse_gene)[apply(assay(rse_gene), 1, function(x) sum( is.na(x) ))==0,]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

second:

assay(rse_gene)=assay(rse_gene)[complete.cases(assay(rse_gene)),]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

I will be appreciate if any one can help me!!

ADD COMMENT
0
Entering edit mode

Try to remove the NA values before you run that function. I'm not sure that the code you have above is going to behave in the way that you expect.

I don't know why you need to use the assay() function before DESeqDataSet()

Can you not just remove NA values from your raw counts matrix before you do anything with DESeq?

ADD REPLY
0
Entering edit mode
> rse_gene <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
> colnames(rse_gene) <- c("a","b","c","d","e")
> rse_gene[is.na(rse_gene))] <- 0
> rse_gene
  a b c d e
1 1 1 1 0 1
2 2 2 0 0 2
3 3 3 3 0 3
4 4 4 4 4 4
5 5 5 5 5 5
> dds <- DESeqDataSet(rse_gene, design = ~ disease.status)
ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6