Estimation of RNA-seq protocol from bam files
1
0
Entering edit mode
4.8 years ago
JJ ▴ 710

Dear all,

As I am working with public data, I would like to confirm the stated information and estimate whether the RNA-seq data comes from a total RNA protocol or one with a PolyA enrichment step. How would you recommend estimating this? Or is there even a tool available for this? I was thinking about using the proportion of exonic and intronic reads (qualimap output) as a measure - but that probably varies quite a bit between datasets. Any other suggestions? Thanks for you input.

Best,

RNA-Seq • 1.0k views
ADD COMMENT
1
Entering edit mode

The approach sounds reasonable. I would try though to get as "positive controls" some published data which used one or the other method and then see if this gives you enough confidence to really call your sample polyA-enriched or rRNA-depleted.

ADD REPLY
1
Entering edit mode

If the data is public you could try to look the information up in associated publication or write to the submitter and ask.

ADD REPLY
0
Entering edit mode

Thanks for your input. I extracted the information of the associated publication - It's not always well described though. I will try to contact the submitters - however I am in general looking for confirmation of the data extraction.

ADD REPLY
1
Entering edit mode
4.8 years ago
yhoogstrate ▴ 150

If you make a discordant alignment and browse to CDR1 (circRNA) you will find quite a number of back-splice junctions in the ribo-minus/random primed data and not in the polyA+. I must admit that I only know this works in human data and I am not sure if that's what you're aiming for.

Intronic content can be done as well, though the intronic/exonic ratio typically differs per gene. If I remember correctly (not my workstation close by..) there's a paper from 2014? in which these overall gene differences are visualised rather well and provides statics in intron/exon/intergenic mapping percentages.

I once had samples with DNA contamination in the RNA-seq (yes...) and that would give a bit of background to the introns and may falsely mark a dataset as 'total RNA' in the intron/exon ratio test.

I am curious what other tricks people will suggest :)

ADD COMMENT
0
Entering edit mode

Thanks! This is a good suggestion. I suppose I could also try to use in general long non-coding RNAs known not to have polyA+ Does anyone have an idea how to obtain a list of such long non-RNAs?

ADD REPLY

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6