For DEG analysis using RNA-seq, we typically remove pseudogenes, microRNA genes, and RNA genes such as LINC RNA, SCARNA, SNOR, etc., the reason being that the single-end RNA-seq typically uses the polyA tails of RNA to fish out RNA to sequence, and these RNA genes do not have polyA so they should not be there. This sounds fine to me.
My questions arise when paired-end RNA-sequencing is used:
If paired-end sequencing uses ~500bp RNA segments, are the polyA tails always in these segments? If not, are the ~500bp segments random chops of the polyA RNA or any RNA?
Should we remove the RNA genes as we have done for single-end in DEG analysis?
I have genes like RPPH1, RMRP, RN45S, & |MALAT1 high on my DEG list using paired-end alignment, but low on the DEG list using single-end alignment. These are RNA genes, but NOT RNA gene classes such as SNORNA, LINCRNA. Why is it so and should I remove these RNA genes from the DEG analysis or not?
Thanks in advance!
Note that many lncRNAs are polyadenylated.
Thanks for the education. However, maybe because their functions are usually unknown, lncRNAs are filtered out in our analysis pipeline. Not sure if this is a good idea.