How to get RPKM from count matrix
0
0
Entering edit mode
13 months ago
Chris ▴ 340

Hi Biostars,

I have a count matrix with mouse gene name and need to get RPKM. I know it is not a good metric but biologists used to it.

gtf <- readGFF("/reference_genome/mm39.ncbiRefSeq.gtf")
gtf_exon <- gtf[gtf$type == "exon", ]
width <- gtf_exon$end - gtf_exon$start + 1
gene_length <- aggregate(width, list(gtf_exon$gene_name), FUN = sum)
row.names(gene_length) <- gene_length$gene_name # may work
colnames(gene_length) <- c("gene_name", "gene_length")
gene_length <- gene_length %>% dplyr::select('gene_length')
gene_length <- gene_length[match(rownames(counts_mouse), rownames(gene_length)),]
y  <- DGEList(counts=counts_matrix, genes=data.frame(Length=gene_length)) 
y  <- calcNormFactors(y)
RPKM <- rpkm(y)

I looked for the gtf file to get the gene length but all the gtf files I found is not in gene name format. Would you please have a suggestion? Thank you so much! https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/genes/

Update: so many genes like this 1700012P22Rik at the beginning of the matrix make me think it is not gene symbol format.

RPKM • 404 views
ADD COMMENT

Login before adding your answer.

Traffic: 1714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6