It seems that the definition of "ribosomal genes" in the context of RNA-seq or scRNA-seq QC can be very different. I've seen some people define them by simply grepping for ^Rp[sl]
(for mouse), which are technically ribosomal proteins. GENCODE GTF has a rRNA
(as well as rRNA_pseudogene
and Mt_rRNA
) biotype, which marks ribosomal transcripts (includes genes such as Rn5s
and n-R5s*
).
For the purpose of QC, is one definition (or a combination of the two) more correct? For example, the default 10x Genomics workflow doesn't include rRNA biotypes in the reference, so perhaps that's why people resort to ribosomal proteins in that case.
This is somewhat of a tangent, but about a third of total transcripts being ribonucleoprotein seems a bit high, no?
It is metabolically active and rapidly dividing cells, so I would say it is at least not unexpected. That data is what it is, I did not make it up. Just checked in some related bulk RNA-seq data, there it is 5-10% of the total counts per sample. Not sure why the scRNA-seq seems to favour these reads to that extend.
Edit: See also https://kb.10xgenomics.com/hc/en-us/articles/218169723-What-fraction-of-reads-map-to-ribosomal-proteins- Seems to be quite common to get these large percentages.
Good article. Didn't realize they addressed this point so well and that the fractions are consistently so high.