when I look at RNA-seq quanitification data, I see miRNA genes:
protein_coding 19962
lncRNA 16901
processed_pseudogene 10167
unprocessed_pseudogene 2614
misc_RNA 2212
snRNA 1901
miRNA 1881 <---------
TEC 1057
So why would a TCGA cohort have a separate pipeline for miRNA quantification?
https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/miRNA_Pipeline/
This video says small RNA seq (mi/si/piRNA) is too small to be captured by regular RNA-seq kits
True. That is why miRNAs are a small fraction of detected genes in the above example while in small RNA-seq it is the majority. No assay is bias-free, therefore you always see some spurious miRNA hits. It's simply not black and white.
that's the nice thing about controls though. if you are simply looking for variance in cases vs controls, then how the data was obtained doesn't matter so much. especially if you scale the data
I could not disagree more. If you want to make statements from data then the experiment must be performed accordingly. Quantifying noise and then pretending it was signal while transforming data to hide that is naive at best and fraud at worst.
i see. so it's noise, not comparatively small levels of detection
would you make the same case for lncRNA detected by a generic RNA-seq protocol?