Most efficient way to convert Counts to RPKM
2
0
Entering edit mode
4.0 years ago
dk0319 ▴ 70

I obtained gene counts in order to perform differential expression. Now I want to generate an MA plot comparing fold-change to RPKM. Is there a streamlined way to do this directly from a file containing gene id's and counts, without having to work with the bam file from which the counts were generated? Cheers

R rna-seq • 4.9k views
ADD COMMENT
2
Entering edit mode
4.0 years ago
h.mon 35k

It is better to use TPM than FPKM, see Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples (this point has been made repeatedly here, it will probably appear at the Similar posts at the right.

However, for a MA-plot, even TPMs are unnecessary, edgeR (maPlot) and DESeq2 (plotMA) have functions to draw MA plots directly from the counts.

ADD COMMENT
0
Entering edit mode

Is there a work flow for generating the TPM from counts?

I am attempting to design a reporter construct using a list of significantly differentially expressed genes, but with consideration to the number of Transcripts that are present in in my control samples compared to my test samples. My hope is that this information would identify which genes have low enough expression in the test relative to the control to make a reporter that is highly specific and sensitive for monitoring protein function.

ADD REPLY
1
Entering edit mode

TPM itself is simple, see Raw counts to TPM in R but you seem to be working on something (at least it sounds that way) not-so-standard so if you seek guidance with that or want to hear opinions on whether your strategy makes sense or not you woul need to explain better what you are actually doing, what the setup is and what kinds of data you have. You also seem to be mixing transcript and gene level counts here, at least you say transcripts, but also talk about differential genes, that is not the same.

ADD REPLY
1
Entering edit mode

Salmon can output counts and TPMs, and is really fast - it will run a few dozen samples in less than one hour.

TPMs estimated from gene counts are bad estimates, see a good explanation here: DESeq2: Is it possible to convert read counts to expression values via TPM and return these values?.

ADD REPLY
1
Entering edit mode
4.0 years ago
ATpoint 85k

You can use (log)CPMs from edgeR starting from a count matrix, it is a one-liner, see Basic normalization, batch correction and visualization of RNA-seq data. Don't use any of these naive metrics that only scale by library size, it is typically not sufficient as it fails to correct for library composition. edgeR has a rpkm function though, which is simply its normalized counts divided by gene length, but I would not use this since in the differential testing you do not consider gene length and the MA-plot is actually meant to visualize the DE results, so just use cpm() as described in the lined post.

ADD COMMENT

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6