I have fastq files of 30 patients obtained by RNA-seq. The files were mapped to reference genome (grch38) by HISAT2. And then, transcript levels were quantified by StringTie. Next, I quantified expression levels for each gene from gtf output files of StringTie with the use of gexpr function in ballgown R package.
As a result, FPKM value of one gene X for one sample Y was 2.994256. I checked the expression levels of the transcripts of gene X for the sample Y. The FPKM values for transcripts were 2.09907, 1.593538, and 0.030281. The total value of these FPKMs is 3.722889, which does not match the result of gexpr.
I thought that gexpr function in ballgown calculates the expression level of each gene by summing the FPKM values of transcripts for the gene, is that not correct? If not, how do the function calculate the expression level of each gene?
FPKM is a normalized value, and its value is related to the feature length: check this explanatory video. You couldn't just sum the FPKM of each transcript to get the gene level FPKM. Ballgown, as can be seen by its code, extracts info for each exon than does the normalization using gene length.
hisat2-stringtie-ballgown is old and imho deprecated, plus was developed for transcript level analysis rather than gene level differential expression. I suggest a modern workflow salmon-tximport-DESeq2 as described here https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html