DGElist Ensembl ID recognized as NA values
1
0
Entering edit mode
5.8 years ago
spymerkouris ▴ 10

Hi, I am quite new to R and I am analysing RNA seq data. This is the format of my data frame

transcript_id   C1  C2  C3  B4  B5  B6  E4  E5  E6
ENSG00000000003 2019    1619    1597    1343    1026    1010    871 1164    1115
ENSG00000000005 1   2   1   1   1   2   0   0   0
ENSG00000000419 1936    1469    1769    2604    2244    2132    2301    2332    2184
ENSG00000000457 790 826 858 693 561 489 456 615 533
ENSG00000000460 320 372 362 368 285 282 254 342 265

When I use the command data <- DGEList(counts) I get the error Error: NA counts not allowed. I realize that is is beacause of the transcript_id column, because when I remove it it works fine. Any suggestions? Thank you

DGElist RNA-Seq • 3.2k views
ADD COMMENT
0
Entering edit mode

it is a bit hard to understand how your R object looks, can you try to reformat your question? when getting errors about NAs in R it is often about values missing from the input, not the format of them (unless it expects a numerical and gets a character)

ADD REPLY
1
Entering edit mode

Problem solved it was due to the character values of the first columns! Thank you.

ADD REPLY
3
Entering edit mode
5.8 years ago
h.mon 35k

The problem is edgeR wants a matrix consisting only of integer counts, with the gene identifiers as row names, and sample identifiers as column names. So first assign the trancript_id column to the matrix row names, then remove this column:

rownames(counts) <- counts[ ,1]
counts <- counts[ , -1]
data <- DGEList(counts)

As it is, your counts object holds numbers and strings, so edgerR assigns NAs to the strings.

ADD COMMENT
0
Entering edit mode

That worked indeed thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6