Lately I was working with the RNA-seq data from a breast cancer cell line panel, which was generated with the ALEXA-seq pipeline.
I was fascinated by the available expressed 0/1 information for every gene. So I had a look at the 'Alternative expression analysis by RNA sequencing' paper and the supplementary information (Figures 5 and 6) . The method described to identify the status of expressed below or above intergenic and locus specific (intragenic) noise is, as far as I understood, based on the measured expression level of exon regions, silent intron regions, and silent intergenic regions.
I wonder if it is possible to adapt this method, so that it can be used generically on any kind of RNA-seq pipeline.
Key question thereby is, if a downloadable reference genome (e.g. Homo_sapiens.GRCh37.75.gtf.gz file at ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/ server) contains all the mentioned kind of genomic regions (exon, silent intron, and silent intergenic)? And further, how is one able to distinguish between these genomic regions?
Any insight is welcome! Thank you,
Elmar
Malachi, thank you for this detailed answer. This brings me quite a bit further.
Now it is especially more clear to me how you defined the 'silent' negative controls for intron and intergenic regions.