Hello,
I have created de novo assemblies from RNAseq reads using velvet/oases for different subjects at several time points. For every subject I have a merged file that has ~ 100,000 de novo transcripts that were created by merging other transcripts with different k-mer sizes. My ultimate goal is to perform differentially expressed analysis on this data set. The next step is to create a reference transcriptome that has all the transcripts from all subjects and time points, with no ambiguity, so I can map the de novo transcripts to the reference transcriptome that was created and quantify expression.
My question is in regards to a program that will merge all the transcripts from all subjects and time points and create a transcriptome that has just one copy of the same transcript and is also not missing any of the de novo transcripts that were found. Any suggestions? Is cd-hit a good option?
Thank you in advance for your help. I really appreciate it.
Are you working with a species that does not have a reference genome sequence?
I'm working with the equine genome. There is a reference genome but we know this reference has problems with assembly and annotation. This is the reason why we performed a de novo assembly of the RNA reads (using velvet/Oases), in addition to the reference based one (using Bowtie/TopHat for mapping followed by Cufflinks).