How to deal with the FPKM values for isoforms in RNA-seq for particular gene
1
2
Entering edit mode
10.3 years ago

This might be one of the trivial things but being new to RNA-seq data I am really confused on how to assign fpkm value for a gene in rna-seq data that has 3 or 4 isoforms.

I have downloaded analysed rna-seq data from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52450 and have noticed that transcripts belonging to the same gene(isoforms) have different fpkm values which is usual. But for my analysis purpose I am thinking whether It is ok if I sum up all the fpkm values of the isoforms to represent that particular gene's expression? or should I keep the values as it is?

example:

genaA Isoform1 2.98
geneA isoform2 5.98
geneA isoform3 2.43

Can I make it as geneA 11.39 (2.98+5.98+2.43)?

isoforms RNA-Seq fpkm • 6.2k views
ADD COMMENT
2
Entering edit mode
10.3 years ago

It should work principally, but divide by the number of isoforms, to have a normalized value or the length of isoforms, depending on what you want. So, it will be called as averaged gene expression. I checked the files you are using, generally, there is another file named gene_exp.diff, which has the expression value per gene generated using Tuxedo suite, so you dont have to calculate it yourself.

For a more detailed answer, check this How do I get one FPKM value per gene?

There is also a raw code (tar gz archive) provided by user mgogol, use discreetly after reading everything, as it assumes you to have some output files from cufflinks/cuffdiff.

ADD COMMENT
2
Entering edit mode

+1 for using gene_exp.diff, except I think genes.fpkm_tracking is the file that you would typically look for if you ran cufflinks on your own (to get FPKM values for each sample)

ADD REPLY
0
Entering edit mode

Yes, Charles, you are right. gene_exp.diff is the output of cuffdiff while doing the DE genes analysis, though reports raw FPKM values.

ADD REPLY
0
Entering edit mode

Hi Thank you for your suggestions I have downloaded the differential expression testing file in which value_1 and value_2 correspond to the expression values for genes at 2 different stages. Again they seem to be present as transcripts. I have annotated the ref-seq IDs with gene names and checked. As you told it would be fine if I take average of fpkm values i think it would be better for me to go head with that.

ADD REPLY

Login before adding your answer.

Traffic: 1036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6