Currently I am using the protein-coding genes from GENCODE as a stable source of gene annotations, however when I try to find gene expression data for these genes, I am confused.
There are RNA-seq data for each cell line, such as K562, and it has a bigWig file called "Transcription of K562 cells from ENCODE" which seems like the expression level on some scale, but I do not find the detailed information about how they calculated it.
Forgive me if it is a simple question, I am completely new to the RNA-seq: should I start from the bam files of alignment for each replicates of the RNA-seq, and count how many of the reads falling on the gene body regions, divided by the total number of reads in the replicate to get the RPKM?
Or can I simply use the value from the bigWig files, then use the sum of the values falling on a gene body, and do some extra normalization?
Thanks!