Question

Searching for mRNA ending with a specific 3' pattern in NON-poly-A RNASeq data.

3

Entering edit mode

3.8 years ago

Pierre Lindenbaum 166k

Hi all,

asking for a colleague,

I'm looking for human non-poly-A mRNA that would end with a specific pattern ( say CCGCAT ).

is it possible to find this in a RNA-SEQ data ? (e.g: https://www.ncbi.nlm.nih.gov//sra?term=SRR059132 ) ? elsewhere ?

My idea would be to map the RNASeq data, use stringtie and then convert the GTF to fasta.

Is there a better / faster way ?

UPDATE: is it possible to know if a given transcript is poly-A+ or poly-A- from RNASeq data ?

rnaseq poly-a • 2.7k views

ADD COMMENT • link 3.7 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Why not look for that pattern in Ensembl transcriptome? Or do you need to find it from sequence data?

ADD REPLY • link 3.8 years ago by GenoMax 151k

0

Entering edit mode

that's a good suggestion, we already looked at ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.73.ncrna.fa.gz in 2013. There were many mRNAs with the pattern but at the end my colleague checked and they were all errors :-( (I think he checked manually/wetlab)

ADD REPLY • link 3.8 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

GENCODE one is most updated. You also have the option of looking in MANE set of data that has one representative for each gene.

ADD REPLY • link 3.8 years ago by GenoMax 151k

0

Entering edit mode

thanks but it looks like those are poly-A+ data isn't it ?

ADD REPLY • link 3.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

oh, there is gencode.v38.lncRNA_transcripts.fa.gz in the directory.

ADD REPLY • link 3.8 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

In addition RNACentral has a whole bunch of other non-coding RNA's. Look in the by-database directories or parse out human ones from big file.

ADD REPLY • link 3.8 years ago by GenoMax 151k

0

Entering edit mode

You can't assume that lncRNAs are non-polyA. Plenty of non-coding RNAs have polyA tails.

ADD REPLY • link 3.7 years ago by i.sudbery 21k

1

Entering edit mode

For poly(A)+ transcripts you'd find an A-rich hexamer (the polyadenylation signal) ~10bp in front of the poly(A) tail see here.

But I don't know if poly(A)- mRNA miss this hexamer.

ADD REPLY • link 3.8 years ago by michael.ante ★ 4.0k

score 2 · Answer 1 · 2021-09-12

I think the short answer is no, its not possible to distinugish poly-A(+) and poly-A(-) transcripts from normal total RNA-seq. Nor can you rely on things like lncRNAs to be non-polyA.

You could try a range of different lines of evidence to converge on a set of things you think are probably non-polyA.

You could start with the matched poly-A(+) and poly-A(-) data sets from ENCODE. If you do transcript specific quantification over something like GENOCDE or RNAcentral, and look for things with differential abundance between the + and - datasets.

You could then cross reference with polyA-seq data that specifically identifes polyA cleavage sites genome wide, and look for transcripst that don't have any signal, yet are highly expressed in the cell type in question.

Finally you could filter to only transcripts that didn't have a poly-A signal.

score 2 · Answer 2 · 2021-09-15

ok, in the end I wrote a tool to find poly-A in RNASeq data. It gives me a suspicion about the poly-A-minus/plus state of a transcript.

ok, in the end I wrote a tool to find evidences of poly-A in RNA-Seq data. https://t.co/hzG7BTDG5f https://t.co/xj9d7toXUx pic.twitter.com/Bh8GcfFlox
— Pierre Lindenbaum (@yokofakun) September 15, 2021