Entering edit mode
19 months ago
Prangan
▴
20
Hello!
I am working on assembling novel transcripts through StringTie for various human RNA-Seq datasets. But, I am not finding any common novel transcripts across the datasets. Am I missing something or is the possibility of finding any common transcripts really low? Any suggestion is welcome!
The human transcriptome is well-annotated and curated by large consortia like GENCODE and Ensembl. I don't see why you would obviously except notable findings in general unless based on previous findings or literature you would expect aberrant splicing or any sort of transcript remodeling in your specific setup. I would not do these sorts of analysis blindly. Chance is probably indeed low given how much data is already out there. I am not an assembly person so take this with a grain of salt.
Although I do get the obvious availability of curation and annotation of the protein-coding transcriptome, I have reason to believe that the same cannot be said for the noncoding transcriptome. You would be surprised at the amount of inconsistencies that exist in the noncoding transcripts across consortia and literature as well. Looking into these inconsistencies as well as the lack of ncRNA transcripts (people are finding and annotating new transcripts currently) requires assembly and analysis. I am divided, not primarily on the approach but more on the technical and significance factors that come into play during assembly. I hope I am able to clarify my approach for any further suggestions. Thank you!