Question

FPKM file differentilly gene anaylsis

0

Entering edit mode

23 months ago

Taha • 0

Hi,

I am utilizing publicly available data for my research. It is increasingly common for papers to report gene expression data in FPKM gene expression, rather than raw counts. (i am also very curious why they do that?). The raw fastq files are available, however, obtaining raw counts from them requires a significant amount of effort, so I would prefer to use the provided FPKM gene expression for genes.

Are there any tutorials or recommended tools to identify differentially expressed genes from FPKM gene expression data?

I would be grateful for any assistance.

rnaseq fpkm • 1.4k views

ADD COMMENT • link updated 23 months ago by dsull ★ 6.9k • written 23 months ago by Taha • 0

1

Entering edit mode

Please have a look at this link, where you can find a very concise and easy explanation of different normalization methods for RNA-seq. Note how FPKM normalization is not appropriate for DE analysis, but only to compare genes expression within the same sample. For DE analysis, the best way to go is using median of ratios, the default normalization method used by DESeq2 (or also TMM normalization by EdgeR).

ADD REPLY • link 23 months ago by Marco Pannone ▴ 810

0

Entering edit mode

Thank you very much for the link; it is a very good summary, and now all those normalization methods make sense.

ADD REPLY • link 23 months ago by Taha • 0

1

Entering edit mode

23 months ago

dsull ★ 6.9k

The raw fastq files are available, however, obtaining raw counts from them requires a significant amount of effort, so I would prefer to use the provided FPKM gene expression for genes.

This is something I never understood. Obtaining raw counts can be done in 3 or 4 short commands on the command-line (and you could even write out those commands in your manuscript methods section, as you should do). And given that it's public data, tools such as BioJupies: https://maayanlab.cloud/biojupies/ should be able to do a proper analysis for you. Sure, it's easier to download an Excel file with numbers, but as scientists, we should do better than that.

It is increasingly common for papers to report gene expression data in FPKM gene expression, rather than raw counts.

I don't think that's true. It might have been true half a decade ago, but it's actually increasingly common for papers to report gene expression data using either TPMs (e.g. those obtained by RSEM) or, better yet, properly normalized counts (such as those outputted by differential gene expression programs).

ADD COMMENT • link 23 months ago by dsull ★ 6.9k

0

Entering edit mode

Although it can take 3 or 4 short commands for you, for people who are not experienced with the terminal and coding, it is usually not easy. And it is not always possible to get help from someone who knows. Moreover, when you don't have access to a core or a computer with the necessary CPU, ram, and disk space, getting raw counts from hundreds of raw fastq files becomes even more challenging.

Thank you for recommending BioJupies; it can be a handy tool for my use case. And I appreciate your answer and comments in general.

ADD REPLY • link 23 months ago by Taha • 0

0

Entering edit mode

No problem! And good point -- I've always worked at institutions that have a core facility, bioinformatics support, or computational biology training but thanks for bringing up the point that not everyone has access to those. Hopefully we at Biostars can provide the support needed to do robust and accurate analysis :) The last thing we would want is an incorrect analysis leading to incorrect conclusions being published in the scientific literature.

Tools are getting better at processing hundreds of RNA-seq datasets on a Macbook with minimal disk/cpu/ram usage -- hopefully we can make them more accessible and user-friendly to both computational biologists and purely wet lab biologists (BioJupies is one such tool and hopefully more tools like it get published in the future).

ADD REPLY • link 23 months ago by dsull ★ 6.9k

score 3 · Accepted Answer · 2022-12-09

3

Entering edit mode

23 months ago

swbarnes2 14k

Given how much work someone will presumably do on any DE genes you report, you really ought to be generating your DE list properly. That means starting from scratch if you must. You can't do proper DE gene assessment with FPKM.