Question

Filter gene with low count in RNA-seq using a function from edgeR

0

Entering edit mode

16 months ago

Chris ▴ 340

Hi all, I try to filter out gene with low count from raw count matrix

I run

d <- DGEList(counts=counts,group=factor(conditions))
keep <- filterByExpr(d)
bcv <- 0.2
et <- exactTest(keep, dispersion=bcv^2)

Error in exactTest(d, dispersion = bcv2) : Currently only supports DGEList objects as the object argument.

d <- estimateTagwiseDisp(d)

Error in .compressDispersions(y, dispersion) : dispersions must be finite non-negative values

After filterByExpr(), I got error. If I don't use filterByExpr(), I don't have that error.

Would you please have a suggestion? Unfortunately, I don't have replicate so just try to use edgeR because it supports non-replicate. I know the result will not rigid but still have some degree for reference, is that correct? Thank you so much.

If I don't run this

counts <- counts[which(rowSums(counts)>50),]

but only

counts <- read.delim('counts.csv', header = T,row.names = 1, sep = ',')

I got this:

d <- DGEList(counts=counts,group=factor(conditions))
Error: NA counts not allowed

    sessionInfo()
    R version 4.2.2 (2022-10-31)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.40.2 limma_3.54.2

loaded via a namespace (and not attached):
[1] compiler_4.2.2 tools_4.2.2    Rcpp_1.0.11    grid_4.2.2     locfit_1.5-9.8 lattice_0.21-8

edgeR RNA-seq • 2.1k views

ADD COMMENT • link updated 16 months ago by petebio ▴ 100 • written 16 months ago by Chris ▴ 340

1

Entering edit mode

Please mention the package you're using. From Googling around, I can guess you're referring to edgeR::filterByExpr() but you're the only person that knows for sure so edit your post and mention the package.

ADD REPLY • link 16 months ago by Ram 44k

1

Entering edit mode

Is d a matrix? You need to create a DGEList object in order to run those functions.

ADD REPLY • link 16 months ago by biofalconch ★ 1.3k

0

Entering edit mode

Yes, I ran DGEList(), but if I use filterByExpr(), I will get the error.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

Edit your post and add the package information. Ideally, you should also add sessionInfo() and the package as a tag.

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

Chris This is starting to feel like pulling teeth. Is that ALL of the output you see from sessionInfo()?

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

I am sorry. The sessionInfo last time was so long. I removed all unrelated packages and added. Is there anything I can help with your work here?

ADD REPLY • link 16 months ago by Chris ▴ 340

0

Entering edit mode

I removed all unrelated packages and added.

That defeats the purpose of adding sessionInfo(). Please just paste the entire output so people know what exactly you're working with.

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

I ran multiple R scripts and installed many packages that why it was so long but that all packages I have for this task. Adding packages like DiffBind seem irrelevant, right?

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

Not at all. While providing a minimal environment that reproduces the error is ideal, quite a few errors are caused by the specific set of packages and environment settings on your machine. sessionInfo() at the time of the error is extremely helpful.

In any case, if your error is resolved, you don't need to edit your post further.

ADD REPLY • link 16 months ago by Ram 44k

0

Entering edit mode

Sorry for the wrong assumption. Yes, the error is resolved. Let me know if I can help with anything.

ADD REPLY • link 16 months ago by Chris ▴ 340

1

Entering edit mode

16 months ago

petebio ▴ 100

You are using the d and keep variables incorrectly. Try:

keep<- filterByExpr(d)
d<- d[keep,]
bcv<- 0.2
et<- exactTest(d, dispersion = bcv^0.2)

ADD COMMENT • link 16 months ago by petebio ▴ 100

1

Entering edit mode

keep filters genes, not samples. the comma is placed wrong.

ADD REPLY • link 16 months ago by ATpoint 85k

0

Entering edit mode

Thank you for your help! I have this when run your suggestion:

d <- d[,keep]

Error in object[[a]][i, j, drop = FALSE] : 
  (subscript) logical subscript too long

ADD REPLY • link updated 16 months ago by Ram 44k • written 16 months ago by Chris ▴ 340

Ram · Accepted Answer · 2023-08-01

3

Entering edit mode

16 months ago

ATpoint 85k

keep is a logical vector that tells which genes fulfill the filtering criteria. Hence:

d <- d[keep,]

ADD COMMENT • link 16 months ago by ATpoint 85k

1

Entering edit mode

By the way, the edgeR manual covers this... ;-)

ADD REPLY • link 16 months ago by ATpoint 85k

0

Entering edit mode

Thanks ATpoint! If I don't run:

counts <- counts[which(rowSums(counts)>50),]

I will get the error:

d <- DGEList(counts=counts,group=factor(conditions))
Error: NA counts not allowed

So what should I do in this case? If I run, then I filter out two times. I have a gene with the two first conditions has around 100 and 250 reads and two other conditions with 0 read, so do this gene being filtered out?

ADD REPLY • link updated 16 months ago by Ram 44k • written 16 months ago by Chris ▴ 340