Entering edit mode
21 months ago
Mamatha Y S
•
0
# duplicated genes and number of duplicates
duplicated_genes <- names(table(df$hgnc_symbol)[table(df$hgnc_symbol) > 1])
gene_counts <- table(df$hgnc_symbol)[duplicated_genes]
#zero expression of each gene
zero_counts <- sapply(unique(duplicated_genes), function(gene) {
sum(rowSums(df[df$hgnc_symbol == gene, -ncol(df)]) == 0)
})
This is the code I'm running. I want to identify duplicate gene from my data frame, and their frequency and in third column I want to know in each duplicated for example its duplicated 7 times, in this 7 times how many of them having rowsum zero (gene expression zero for all samples).
First two lines I'm getting correct result but zero expression I'm getting NA for all the genes I m not getting why. Please help me with this
Is the
hgnc_symbol
the last column in yourdf
? Is that why you're using-ncol(df)
for therowSums
function?You're getting
NA
because some values in yourdf
areNA
. You could usena.rm = TRUE
parameter in thesum
function as long as you understand what it's doing and the fact that you're expecting 0 and there's alsoNA
in there indicating there must either be a gap in your expectations or a difference in what 0 andNA
mean.