I have recently received some RNA-seq data (PoliA library prep) and I am wondering if I can trust the expression of genes that are not described as polyadenylated, since I have encountered with some that are not polyadenylated and they have expression... and I also found some that they are significantly expressed between control and condition.
*FYI: I have checked some long non coding RNAs and their sequence, verifying that they don't have poliA tail. As well as their expression (they have actually counts).
I know that if you are interested in lncRNAs, ribodepletion is recommended. But having used a Poly(A) library for RNA-seq.... I was wondering how it is possible to get counts/expression in those genes that are not polyadenylated and they are not supposed to be enriched (and they shouldn't appear?). *I have used STAR for the alignment (plus --quantmode) and I have not specified any specific treatment for lncRNA or so.
Could I trust in the expression of those genes and the differentially expressed genes that I get from DESeq2?
Any feedback, opinion or even papers to answer this will be really appreciated.
Thanks very much in advance!
In addition to what @ATPoint says about polyA being a statistical enrichment, rather than an absolute filter, I'm not sure you can tell if a lncRNA is polyA or not by looking at the sequence? Many lncRNAs are poly-adenylated, and you probably won't be able to tell by looking at the sequence. Even those that arn't generally polyA, such as NEAT1 for example, have minor isoforms that are poly-adenylated. Further, some gene families that are well known to use alternative termination pathways, such as replication dependent histones, will use cleavage poly-adentylation for transcript events that escape normal termination. The sequence requirements to get at least some cleavage-poly-adenylation are pretty minimal, and its fairly likely that any transcribing polymerase will hit such a sequence sooner or later if it manages to escape, say, hairpin termination.
It's enrichment, not perfect selection. Stretches of polyT can attract polyA binding. Beyond that, how did you check these genes are not polyA?
Thanks very much for your reply!
"Stretches of polyT", did you mean a particular step of the poliA enrichment? (because I tried to search about it since I didn't know what you were saying, but I didn't find anything in particular).
Re the checking of genes: I checked the sequence of some particular transcripts on Ensembl (section cDNA) and NCBI (Section: NCBI Reference Sequences (RefSeq) -- RNA Sequence). And according to if they had polyA tail or not (or at least, if they had several A at the end), I assumed if they were poly-adenylated or not.
Polyadenylation is a post-transcriptional modification, it's not encoded in the DNA and therefore not annotated in these files. You cannot select like that. With "stretches" I mean regions in the transcript that are rich in T's.
I am sorry for my ignorance, I didn't check the sequence of polyadenylated genes (they don't have the polyA tail in those files either), so I should have seen that my approach was not okay.
Do you know if there is a way to check which genes/transcripts are polyadenylated and which ones not? Maybe a database (updated and reliable) that I could use? Thanks again for your help.