Question

What would be the best way to calculate fold changes and p-values of this RNA-seq data?

0

Entering edit mode

7.5 years ago

mmccarthy781 ▴ 10

Hey all,

This is my first time posting, so I hope this question isn't too open ended; sorry if it's a bit long. Anyways, I'm a current bioinformatics masters student, and I've just joined up with a cancer research lab as an intern. They have RNA-seq data that they received from outsourcing their sequencing, and from it, they'd like me to get them a list of the most significant differentially expressed genes by fold changes and p-values.

The problem is that they don't have the raw data. The facility they outsourced their sequencing too did some of the data analysis for them, so what I have to work with is a data frame for each trial, the control and three separate tests. Each data frame contains the gene ID, the transcript ID(s), the length, the expected count, and the FPKM.

This type of analysis is new to me, and in reading how to complete this task using tools such as edgeR, it seems as though it's important to have the raw read counts, which unfortunately I don't have, and don't think I can get. I don't believe that the expected_count is the same thing is it? They do supply an equation for the FPKM as FPKM = (10^6 * C) / )N * L / 10^3); where C is the number of fragments uniquely aligned to the gene, N is the total number of fragments that are uniquely aligned to all genes, and L is the number of bases on the gene. Would C in this equation be equal to the raw read count? It appears to be approximately equal to the expected read count.

Any ideas on how to solve this problem are much appreciated!

RNA-Seq rna-seq • 2.0k views

ADD COMMENT • link updated 7.5 years ago by Devon Ryan 104k • written 7.5 years ago by mmccarthy781 ▴ 10

1

Entering edit mode

Don't waste your time with this. Your group payed for the sequencing, whoever did it will happily give you the fastq files.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

Yeah I'm going to attempt to get the fastq files. I was just wondering if there was any use to what I have currently. I believe that it's RSEM output.

ADD REPLY • link 7.5 years ago by mmccarthy781 ▴ 10

0

Entering edit mode

¯_(ツ)_/¯

In all seriousness, do you know how did they produced the expected counts? Because if you do, you might be able to use tximport to produce counts and afterwards use edgeR :). HOWEVER! Take into account that for you to publish it is likely you will be asked to upload the raw data to a publicly available website.

ADD REPLY • link 7.5 years ago by biofalconch ★ 1.3k

0

Entering edit mode

So looking into the problem a bit more, it looks like this is the direct output from RSEM. With the expected counts being:

"'expected_count' is the sum of the posterior probability of each read comes from this transcript over all reads. Because 1) each read aligning to this transcript has a probability of being generated from background noise; 2) RSEM may filter some alignable low quality reads, the sum of expected counts for all transcript are generally less than the total number of reads aligned."

ADD REPLY • link 7.5 years ago by mmccarthy781 ▴ 10

score 2 · Answer 1 · 2017-05-30

2

Entering edit mode

7.5 years ago

Devon Ryan 104k

If it's really the output of RSEM, then you can use limma/voom on it. But honestly I wouldn't trust a company, I've seen them do completely absurd things with an analysis.