How does gexpr function in ballgown package calculate the gene expression level from transcript expression levels?
0
0
Entering edit mode
3.0 years ago
Apprentice ▴ 170

I have fastq files of 30 patients obtained by RNA-seq. The files were mapped to reference genome (grch38) by HISAT2. And then, transcript levels were quantified by StringTie. Next, I quantified expression levels for each gene from gtf output files of StringTie with the use of gexpr function in ballgown R package.

As a result, FPKM value of one gene X for one sample Y was 2.994256. I checked the expression levels of the transcripts of gene X for the sample Y. The FPKM values for transcripts were 2.09907, 1.593538, and 0.030281. The total value of these FPKMs is 3.722889, which does not match the result of gexpr.

I thought that gexpr function in ballgown calculates the expression level of each gene by summing the FPKM values of transcripts for the gene, is that not correct? If not, how do the function calculate the expression level of each gene?

ballgown FPKM R • 858 views
ADD COMMENT
0
Entering edit mode

FPKM is a normalized value, and its value is related to the feature length: check this explanatory video. You couldn't just sum the FPKM of each transcript to get the gene level FPKM. Ballgown, as can be seen by its code, extracts info for each exon than does the normalization using gene length.

ADD REPLY
0
Entering edit mode

hisat2-stringtie-ballgown is old and imho deprecated, plus was developed for transcript level analysis rather than gene level differential expression. I suggest a modern workflow salmon-tximport-DESeq2 as described here https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6