Question

NA values of count matrix in class DESeqDataSet

0

Entering edit mode

7.2 years ago

maleknias ▴ 40

Dear all

Hi

I downloaded a data set in class "RangedSummarizedExperiment" from "https://jhubiostatistics.shinyapps.io/recount/". I want to find differential expression genes. My code is :

load("~/Downloads/rse_gene.Rdata")

class(rse_gene)

**[1] "RangedSummarizedExperiment"

attr(,"package")

[1] "SummarizedExperiment"**

data=colData(rse_gene)

names= names(colData(rse_gene))

write.table(data,file="colData.csv", col.names=names,sep="\t",row.names=FALSE)

data1=fread("~/Downloads/colData.txt")

colData(rse_gene) =DataFrame(data1)

colData(rse_gene)$disease.status = as.factor(colData(rse_gene)$disease.status)

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

**converting counts to integer mode Error in validObject(.Object) :

invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix In addition: Warning message: In mde(x) : NAs introduced by coercion to integer range**

I use two solution for this problem but both of them were useless:

1- Keep only rows with non-zero counts:

rse_gene <- rse_gene[rowSums(assay(rse_gene)) != 0, ]

2- Replace the NA value by -9 :

countdata <- assay(rse_gene)

replace(countdata,countdata==0,-9)

coldata <- colData(rse_gene)

ddsMat <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ disease.status)

I will be appreciate if any one can help me!!

RNA-Seq recount • 11k views

ADD COMMENT • link 7.2 years ago by maleknias ▴ 40

0

Entering edit mode

Try this:

library(DESeq2)
load("rse_gene.Rdata")
ddsse=DESeqDataSet(rse_gene,design=~disease.status)

Btw, what is the accession number of the data?

ADD REPLY • link 7.2 years ago by cpad0112 21k

score 0 · Answer 1 · 2017-09-26

0

Entering edit mode

7.2 years ago

Kevin Blighe 88k

Your command using rowSums won't work. What I normally do is something like this:

test <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
colnames(test) <- c("a","b","c","d","e")
test
a b  c  d e
1 1  1 NA 1
2 2 NA NA 2
3 3  3 NA 3
4 4  4  4 4
5 5  5  5 5

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]
a b c d e
4 4 4 4 4
5 5 5 5 5

I originally used this way of filtering for removing transcripts that had zero counts across 5 or more samples, using something like this:

apply(test, 1, function(x) sum(x==0))<5

ADD COMMENT • link 7.2 years ago by Kevin Blighe 88k

1

Entering edit mode

Instead of

test[apply(test, 1, function(x) sum( is.na(x) ))==0,]

you could use the built-in function complete.cases, e.g

test[complete.cases(test), ]

ADD REPLY • link 7.2 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Thanks e.rempel - I knew that the function existed but could not remember the name at the time of writing!

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

score 0 · Answer 2 · 2017-10-02

0

Entering edit mode

7.2 years ago

maleknias ▴ 40

Dear all

I used all your guide but unfortunately all of them have the last same error !!

first:

assay(rse_gene)=assay(rse_gene)[apply(assay(rse_gene), 1, function(x) sum( is.na(x) ))==0,]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

second:

assay(rse_gene)=assay(rse_gene)[complete.cases(assay(rse_gene)),]

dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

I will be appreciate if any one can help me!!

ADD COMMENT • link 7.2 years ago by maleknias ▴ 40

0

Entering edit mode

Try to remove the NA values before you run that function. I'm not sure that the code you have above is going to behave in the way that you expect.

I don't know why you need to use the assay() function before DESeqDataSet()

Can you not just remove NA values from your raw counts matrix before you do anything with DESeq?

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

> rse_gene <- data.frame(c(1,2,3,4,5), c(1,2,3,4,5), c(1,NA,3,4,5), c(NA,NA,NA,4,5), c(1,2,3,4,5))
> colnames(rse_gene) <- c("a","b","c","d","e")
> rse_gene[is.na(rse_gene))] <- 0
> rse_gene
  a b c d e
1 1 1 1 0 1
2 2 2 0 0 2
3 3 3 3 0 3
4 4 4 4 4 4
5 5 5 5 5 5
> dds <- DESeqDataSet(rse_gene, design = ~ disease.status)

ADD REPLY • link 7.2 years ago by ioannis ▴ 50