Question

Rna-Seq: Novel Transcripts Found. What Next?

12

Entering edit mode

11.5 years ago

jobinv ★ 1.1k

If I were use cufflinks in de novo mode to find transcripts or genes in my data that did not align to known transcripts from UCSC or Ensembl, I wouldn't know what to do downstream of this. How would one go about confirming that these are indeed novel? What sort of validation steps would one take (computational or non-computational), what in-depth information can I go looking for, what databases would have useful information for me? Thanks!

rna-seq • 13k views

ADD COMMENT • link updated 11.5 years ago by Devon Ryan 104k • written 11.5 years ago by jobinv ★ 1.1k

4

Entering edit mode

No offense, but is this a homework or take-home test question? It reminds me of something I would write for a test.

If not, try giving some details regarding what you've tried and what sorts of things you're actually interested in. You might also mention what species you're using, since some of them have better annotations than others. Your question is extremely broad, so there will be no single best answer.

ADD REPLY • link 11.5 years ago by Devon Ryan 104k

1

Entering edit mode

Oh, sorry. The issue is precisely that it is a hypothetical question at this point. I am working with human cancer in a mouse xenograft model, and I'm wondering whether it's even worth attempting to look for novel transcripts, or if it would just be a waste of money to start a pipeline where I wouldn't know what to do with the results that I find. My apologies if this makes it too broad a question, it wasn't my intention.

If it is not an appropriate question for this forum, should I perhaps delete it?

ADD REPLY • link 11.5 years ago by jobinv ★ 1.1k

1

Entering edit mode

No worries and thanks for the clarification!

In your case, I wouldn't personally bother following up on novel transcripts that aren't differentially expressed (and even then I'd be very hesitant). The real question to me would be one of biological meaning and significance. It's far from implausible that there are novel transcripts in cancer that are biologically/clinically meaningful. However, in the context of a xenograft, it's difficult to discern these transcripts from those appearing due to a weird xenograft-specific effect.

Perhaps others will have a different opinion.

BTW, should you decide to follow up on this, I'll post an answers below (it's tough to format things in the comment section).

ADD REPLY • link 11.5 years ago by Devon Ryan 104k

Ram · Answer 1 · 2013-06-04

How decide if a transcript predicted by Cufflinks is 'novel'

A 'novel' transcript could simply be defined as one that has not been observed before. If the transcript structure you observe is not currently represented in RefSeq, Ensembl, or UCSC there is a good chance that it might be novel. You can view transcripts from each of these sources in IGV or the UCSC browser, etc.

If the putatitive novel transcript is an alternative isoform of a known gene, examine the structure of your novel transcript. Is there is a particular feature of the transcript that is distinctive (e.g. a novel exon, exon skipping event, intron retention). You can examine the complete corpus of mrnas and ESTs from GenBank for your species. You can view these in the UCSC browser, or download them in fasta format here: est.fa.gz and mrna.fa.gz (again using human as an example). If your putative novel transcript is not in a known gene region at all does it share any similarity to known transcripts?

Validation of novel transcripts

This will commonly involve some combination of RT-PCR, qPCR, cloning and Sanger sequencing. Does the predicted transcript sequence contain an ORF. Try feeding it into ORF finder for example. If not, does it have features of any known types of RNA gene? You could try folding it. Many RNA-folding tools exist already.

Functional validation will depend on what you find above...

score 18 · Answer 2 · 2013-06-04

Firstly, see my comment above regarding my personal opinion of how useful this would be for your situation. But, should you disagree (and when you do so and get a Nature paper because of it, rest assured that I will eat a sufficient amount of crow):

For a non-automated step, I would blast any hits to see if someone picked up something similar (some projects have found a HUGE number of random transcripts). Having said that, even if it's been seen before, that doesn't mean anyone has followed up on it. So, even if something isn't novel (strictly speaking) that doesn't mean it's not interesting for follow-up.
Look to see how conserved this region is in related species. If a region is unconserved, the odds are good that it's just noise (i.e., it may be transcribed, but it probably does nothing). I would strongly encourage you to place a good bit of emphasis on this in ranking candidates for follow-up. If a region is conserved, people will MUCH more readily believe that what you found is meaningful.
Does the transcript look like it might encode a protein (look for an open reading frame, etc.)? If so, does it have homology to anything?
Does any of the ENCODE data suggest that this might be a gene (PolII binding, histone modifications, etc.)? If the ENCODE data suggests that there might be transcription there but the region isn't conserved, I would follow my recommendations above for non-conserved regions.

Those are some initial non-wet bench things to do to get you started. Among the wet-bench follow-ups:

Northern blot/qPCR/whatever to look at tissue distribution (in your case, I guess also to look in non-xenograft samples).
RACE or some other method to try to asses the full length of the transcript.
Generate an antibody against it to see if it actually makes a protein (there are other ways of doing this, of course).

There are a number of other things one could do, mostly dependent upon whether the transcript is coding or not.