I think the short answer is no, its not possible to distinugish poly-A(+) and poly-A(-) transcripts from normal total RNA-seq. Nor can you rely on things like lncRNAs to be non-polyA.
You could try a range of different lines of evidence to converge on a set of things you think are probably non-polyA.
You could start with the matched poly-A(+) and poly-A(-) data sets from ENCODE. If you do transcript specific quantification over something like GENOCDE or RNAcentral, and look for things with differential abundance between the + and - datasets.
You could then cross reference with polyA-seq data that specifically identifes polyA cleavage sites genome wide, and look for transcripst that don't have any signal, yet are highly expressed in the cell type in question.
Finally you could filter to only transcripts that didn't have a poly-A signal.
Why not look for that pattern in Ensembl transcriptome? Or do you need to find it from sequence data?
that's a good suggestion, we already looked at ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.73.ncrna.fa.gz in 2013. There were many mRNAs with the pattern but at the end my colleague checked and they were all errors :-( (I think he checked manually/wetlab)
GENCODE one is most updated. You also have the option of looking in MANE set of data that has one representative for each gene.
thanks but it looks like those are poly-A+ data isn't it ?
oh, there is gencode.v38.lncRNA_transcripts.fa.gz in the directory.
In addition RNACentral has a whole bunch of other non-coding RNA's. Look in the
by-database
directories or parse out human ones from big file.You can't assume that lncRNAs are non-polyA. Plenty of non-coding RNAs have polyA tails.
For poly(A)+ transcripts you'd find an A-rich hexamer (the polyadenylation signal) ~10bp in front of the poly(A) tail see here.
But I don't know if poly(A)- mRNA miss this hexamer.