How decide if a transcript predicted by Cufflinks is 'novel'
A 'novel' transcript could simply be defined as one that has not been observed before. If the transcript structure you observe is not currently represented in RefSeq, Ensembl, or UCSC there is a good chance that it might be novel. You can view transcripts from each of these sources in IGV or the UCSC browser, etc.
If the putatitive novel transcript is an alternative isoform of a known gene, examine the structure of your novel transcript. Is there is a particular feature of the transcript that is distinctive (e.g. a novel exon, exon skipping event, intron retention). You can examine the complete corpus of mrnas and ESTs from GenBank for your species. You can view these in the UCSC browser, or download them in fasta format here: est.fa.gz and mrna.fa.gz (again using human as an example). If your putative novel transcript is not in a known gene region at all does it share any similarity to known transcripts?
Validation of novel transcripts
This will commonly involve some combination of RT-PCR, qPCR, cloning and Sanger sequencing. Does the predicted transcript sequence contain an ORF. Try feeding it into ORF finder for example. If not, does it have features of any known types of RNA gene? You could try folding it. Many RNA-folding tools exist already.
Functional validation will depend on what you find above...
No offense, but is this a homework or take-home test question? It reminds me of something I would write for a test.
If not, try giving some details regarding what you've tried and what sorts of things you're actually interested in. You might also mention what species you're using, since some of them have better annotations than others. Your question is extremely broad, so there will be no single best answer.
Oh, sorry. The issue is precisely that it is a hypothetical question at this point. I am working with human cancer in a mouse xenograft model, and I'm wondering whether it's even worth attempting to look for novel transcripts, or if it would just be a waste of money to start a pipeline where I wouldn't know what to do with the results that I find. My apologies if this makes it too broad a question, it wasn't my intention.
If it is not an appropriate question for this forum, should I perhaps delete it?
No worries and thanks for the clarification!
In your case, I wouldn't personally bother following up on novel transcripts that aren't differentially expressed (and even then I'd be very hesitant). The real question to me would be one of biological meaning and significance. It's far from implausible that there are novel transcripts in cancer that are biologically/clinically meaningful. However, in the context of a xenograft, it's difficult to discern these transcripts from those appearing due to a weird xenograft-specific effect.
Perhaps others will have a different opinion.
BTW, should you decide to follow up on this, I'll post an answers below (it's tough to format things in the comment section).