Entering edit mode
3.3 years ago
Nai
▴
50
I have count matrix from feature count with gene_names for each sample. When I am using DESeqDataSetfromMatrix with the following commands:
count <- read.csv("matrix.count", header = TRUE, row.names = 1, sep ='\t')
info <- read.table("coldata2.txt", header = TRUE, sep ='\t')
library(DESeq2)
library(apeglm)
ds <- DESeqDataSetFromMatrix(count, info, ~condition)
Error: DESeqDataSet(se, design = design, ignoreRank) : NA values are not allowed in the count matrix.
any(is.na(count))
TRUE
all(is.numeric(count)
FALSE
I tried the following too:
ds <- DESeqDataSetFromMatrix(count, info, ~condition, tidy=TRUE)
ds <- DESeqDataSetFromMatrix(count, info, ~condition, tidy = FALSE)
then I got repeatitive ids in row.names.
Kindly provide me the solution.
You should figure out why there are NA values in your count matrix. You can figure out which genes have NA values using the following code.
Once you find rows with NA values, get back to us with what you find. It's also important to go through the methods that generated the count matrix to find out where NA could have been introduced.
When i am looking in the data there is no NA, only lot of 0s. NA is only in gene names. I replaced it even I have this error
Hi, count[rowSums(is.na(count)) > 0, ] It showed all column names only.
The problem, as you have inferred, is this:
Please try to understand why there would be
NA
values in your count data. You may have to review the upstream data processing steps.If the NA values just need to be 0, then you can impute these via:
For further help, please show the output of:
PS - I am not sure that you need to load apeglm
Dear All,
Thank you. for your support so, I completed to calculate the DEGs prediction. I would like to get the expression value for each sample. while my output is having basemean, log2FoldChange, Padjut, lfcSE, stat. after following each and every step as you guided. How can I reduce number of genes. The commands:
Then I got 36,000 genes without any information for per sample.
Which per sample information do you want? If the normalised counts, these are accessible via:
ohk. Thank you. For these I got 36000 genes. I want to filter significant genes. I would like to know meaning for alpha = 0.05. How can I filter normalized gene per sample.
Hi, please take a look at the
subset()
function, e.g.:Then, take the list of genes from the output of the above command (they may be set as
rownames()
), and use these to further subset the normalised counts.It helps if you share some sample data; otherwise, we can only hypothesise about how it appears your data.
res_ddsDE_new has 36,000 rows. When I am using subset(res_ddsDE_new, padj < 0.05 & abs(log2FoldChange) > 1) res_ddsDE_new baseMean log2FoldChange <numeric> <numeric> DDX11L1 1.779144 -1.4955939 WASH7P 152.518293 -0.0505911 MIR6859-1 20.653876 0.5689275 MIR1302-2HG 0.255387 -1.9691031 FAM138A 0.353478 0.1574042
Then I will get rownames: 1 (Output of : subset(res_ddsDE_new, padj < 0.05 & abs(log2FoldChange) > 1)
ZWINT 82.3486 -1.74934 0.330334 -5.29568 1.18575e-07 0.0043631
I want significant genes per sample after counts(dds_new, normalized = TRUE) where every gene has expression values, there is no pvalue. then How can I shortlist them.
res_ddsDE_new has 36,000 rows. When I am using subset(res_ddsDE_new, padj < 0.05 & abs(log2FoldChange) > 1) res_ddsDE_new baseMean log2FoldChange <numeric> <numeric> DDX11L1 1.779144 -1.4955939 WASH7P 152.518293 -0.0505911 MIR6859-1 20.653876 0.5689275 MIR1302-2HG 0.255387 -1.9691031 FAM138A 0.353478 0.1574042
Then I will get rownames: 1 (Output of : subset(res_ddsDE_new, padj < 0.05 & abs(log2FoldChange) > 1)
ZWINT 82.3486 -1.74934 0.330334 -5.29568 1.18575e-07 0.0043631
By using subset and filtering of padj and logfold , I did not get significant result. How can I get TPM group 1 and group 2 value for each gene