Hi all. This is my first steps in bioinfomatics, so please forgive me for potentially trivia questions.
I'm trying to study variations in transcripts of a same gene on different developmental stages of dmel flies. In particular I want to verify the hypothesis that 3' UTRs are shortened at latest developmental stages. Obvious intermediate step of this study is obtaining transcript from assembled RNA-Seq data. This is where I have my problems.
After mapping with Tophat and assembling with Cufflinks my final result is transcripts.gtf file. I can use gffread as
gffread -g dmel.bowtie.index.fa -w transcripts.fa cufflinks_out/transcripts.gtf
to get aligned transcripts, but these transcripts will be from reference genome, so stripped of any potential variations.
So, the question is, how can I get these "raw" transcripts that were aligned by cufflinks from sequencing data? It sounds like a de novo alignment problem, but there IS reference genome and annotation available, so may be it's possible to take advantage of it?