When preforming a high-throughput RNA sequencing (with human samples), how many of the 3 billion base pairs of DNA will get covered after alignment and quality control?
In other words - how much of the genome can theoretically be reproduced from the RNA-seq?
I'm looking for even just a rough estimate, but if it helps, the samples that I'm interested in are from human brains, expressing ~16,000 genes, of which ~13,000 are protein coding.
I couldn't find an answer by googling, and will appreciate any help.
It is going to depend on quality of your libraries and the method used for making them. Since you are going to enrich/capture non-rRNA transcripts what gets captured/sampled in your library is fixed. In theory all such transcripts present in your sample have a chance of being captured in the library.
Thank you for your answer! This is actually not my data, I'm just using it to do some calculations. According to the article from which it is taken, they used Illumina Stranded Total RNA Prep with Ribo-Zero Plus for the library, on cortical samples. Does this help? Or, is there a way for this to be calculated maybe?