I have cufflinks output from tophat alignments and I want to get the sequences of the transcripts. I've been extracting the sequence from the reference genome, but I'm working in chicken where the reference genome is constructed from the wild type and I'm sequencing a very specialized breed, so I would really like to get the sequences of the transcripts from my RNA-seq data. I've searched around this site and other places and found some solutions like generating vcf files with samtools but they all seem geared towards just getting a single sequence, rather than thousands. I think using a loop with these methods will be extremely slow. Is there any quicker way to get the full set of transcript sequences predicted by cufflinks from the RNA-seq data?
Read this description :
https://transdecoder.github.io/
And read these papers:
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323
MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3789545/
These people worried about splicing:
http://www.cs.colostate.edu/~asa/pdfs/spliceGrapherXT.pdf#page=1&zoom=auto,-73,798
http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0156132
I don't see how TransDecoder is useful here. It seems like it requires already having a fasta file of the transcript sequences, or if you input a gtf it extracts the sequences from the genome which is what I don't want to do.
RSEM is not suitable as I'm interesting in novel transcripts, and RSEM aligns to a known transcript set rather than the whole genome (unless I'm misunderstanding).
I am not sure MITIE is good for my purpose either. It says it will report a small set of optimal transcripts from a set of RNA-seq libraries, however I'm interested in finding novel transcripts, especially long non-coding RNA with a focus on tissue-specific transcripts. So I think MITIE would miss picking up many of those.