error in duplicate identification
0
0
Entering edit mode
21 months ago
# duplicated genes and number of duplicates
duplicated_genes <- names(table(df$hgnc_symbol)[table(df$hgnc_symbol) > 1])
gene_counts <- table(df$hgnc_symbol)[duplicated_genes]

#zero expression of each gene
zero_counts <- sapply(unique(duplicated_genes), function(gene) {
  sum(rowSums(df[df$hgnc_symbol == gene, -ncol(df)]) == 0)
})

This is the code I'm running. I want to identify duplicate gene from my data frame, and their frequency and in third column I want to know in each duplicated for example its duplicated 7 times, in this 7 times how many of them having rowsum zero (gene expression zero for all samples).

First two lines I'm getting correct result but zero expression I'm getting NA for all the genes I m not getting why. Please help me with this

r RNA-seq • 385 views
ADD COMMENT
1
Entering edit mode

Is the hgnc_symbol the last column in your df? Is that why you're using -ncol(df) for the rowSums function?

You're getting NA because some values in your df are NA. You could use na.rm = TRUE parameter in the sum function as long as you understand what it's doing and the fact that you're expecting 0 and there's also NA in there indicating there must either be a gap in your expectations or a difference in what 0 and NA mean.

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6