Hi
The RNA-seq data which I work with are ribosomal RNA depleted libraries, meaning they contain ncRNAs, snRNAs etc... in addition to mRNAs. To filter ensemble gtf file before counting, which kind of gene_biotype should I remove?
high-abundance RNAs including mt-RNA,rRNA, snRNA, snoRNA, tRNA, histone RNAs ....?
pseudogene?
Thanks, so which length do you usually use as cutoff or based on which criterion? if I have 50 bp reads , then I should use it as cutoff?
How long are your reads?
Single end, all reads between 39-42 bp
and how about histone RNAs?
If you can capture them with you sequencing, why not detect them? I realize the analysis can become more complex if one expands out of your typical pool of mRNA. However, the goal of RNA-Seq is to characterize the approximate fold change of RNA species present in your biological source (cells, tissue, etc).
I think narrowing the classes of RNA you are considering a priori is bad science. If you throw out a class of RNA you are effectively saying that the class has no biological role in what you are studying. There's no good reason that I can see for filtering biotypes for anything other than size.