Hi, I am using some RNA-seq library to test my assembler. Now what I am wondering is: HOW can we say a transcript is there?
- In RNA-seq libraries, can the reads be from UTR regions?
- If above is true, will the full UTR regions always be fully covered (or fragmented)?
- Usually, can I say, "transcript A is expressed because its full coding regions are assembled?"
Thanks.
Hi Charles,
Would you recommend Trinity for 454 sequences then? If yes, how can we define the start and end of the transcript?
Thanks.
You should ask the developers about 454 sequences. My guess is that you would at least need to change some parameters.
There is also an FAQ page.
I'm not sure if I understand your second question. Based upon my experience with Illumina data, one problem I have within Trinity is that is seemed to inappropriately stitch unrelated sequences. Also, the RNA-Seq data that I see typically don't have complete or even coverage across known transcripts (when aligned to a reference instead of doing de novo assembly), which is why I think using coverage of a well-defined but partial sequence is better for differential expression purposes. In general, depth coverage of reads aligned to the assembly and uniformity of coverage across that assembly are quality control metrics to assess the quality of the assembly. Unless the sequencing technology directly produces reads that span the whole transcript and you can be absolutely certain that the RNA didn't get fragmented prior to assembly, I can't think of a specific reason why analysis strategies would be fundamentally different (and, in that scenario, there wouldn't be a need for de novo assembly in the first place).