The long-read sequencing technology (also known as third generation sequencing), which can directly obtain the full-length mRNA sequence and structural information without splicing is the foundation for the full-length transcriptome. It can offer valuable insights into a number of disease-related issues. This long-read sequencing technology has unparalleled advantages in the complex structural analysis since its read length is significantly longer than that of second-generation sequencing technology with short reads strategy. These advantages include the direct acquisition of full-length transcript sequences, the discovery of new genes and new transcripts, as well as the identification of fusion genes. Based on these advantages, full-length transcriptome analyses must be carried out on numerous samples to identify complex transcript information and alternative splicing variants.
The first reference genome assembly of a non-model pochard species is discussed by Mueller Ralf C et al. in this study. By combining the strengths of RNA-seq and Iso-seq technologies, this annotation method produces a merged transcriptome with functional annotation and expression profiles, offering insights into gene expression. Owing to alternative splicing, a single gene can have multiple alternative variants (isoforms) and as a consequence can be translated into proteins with different functions. In full-length transcript isoform sequencing (Pacific Biosciences [PacBio] Iso-Seq) of messenger RNA, the result showed that it retained 80.57% (3.84%) full-length non-chimeric (FLNC) reads after error correction, and Minimap2 can map 97.39% (1.34%) of long reads to the reference genome.
The comparison of length discovery between the reference transcriptomics of long and short read provides insight into these sequencing technology choices. Although short-read data was enough for annotating protein-coding genes in this study, long-read data retrieved more transcripts per gene and maybe more protein-coding genes that could not be annotated. The researchers expected that as the precision of base calling in long-read sequencing improves, they would use high-coverage long-read sequencing to rebuild the transcriptome. Short-read transcriptome sequencing may eventually become a consumable product. In conclusion, the crested pochard's genome and transcriptome annotations constitute the basis for further research, such as those on disease response, and the high quality of the dataset for non-model species enables highly accurate resolution. While researching zoonotic pathogen reservoirs, it is crucial to consider genetic variations and similarities among closely related species.2
Tumors display widespread transcriptome changes, but the full picture of transcript-level splicing in cancer is unclear. Full-length transcripts and infer tumor-specific splicing events can be located and annotated using the long-read sequencing platform, such as PacBio system. Application of the long-read sequencing strategy to breast cancer samples identified thousands of previously unannotated transcripts; approximately 30% of novel transcripts affected protein-coding exons and were predicted to alter protein localization and function. To support the transcription and translation of novel transcripts, Veiga, Diogo FT et al extensively cross-validated omics datasets. 3059 breast tumor-specific splicing events were identified, 35 of which were significantly associated with patient survival. Of these, 21 were absent from GENCODE and 10 were enriched in specific breast cancer transcripts. Taken together, the findings demonstrate the complexity, cancer transcript specificity, and clinical relevance of previously unidentified breast cancer transcripts and splicing events. They can only be annotated by long-read RNA studies and also provides a wealth of immuno-oncology therapeutic target resources.[3]
We now have a better understanding of multi-sample transcriptome research owing to the aforementioned literature. We can obtain clearer transcripts by multi-sample third-generation sequencing for complex isoforms, complex transcripts, and specific transcripts in diseases and cancers. As a trust global sequencing provider, Novogene is pioneer in applying the cutting-edge technology in the delivery of latest genomics services and solutions. To data, Novogene has built a world-leading long-read sequencing laboratory capabilities, including PacBio Sequel II, Sequel IIe and latest Revio systems, as well as Oxford Nanopore PromethION platform, to respond to the need for more responsible sequencing around the world.
Application summary
Visit our website: Using long-read sequencing technology to explore the complex structure of transcripts
Reference
- Mueller, Ralf C., et al. "A high-quality genome and comparison of short-versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)." GigaScience 10.12 (2021): giab081.
- Veiga, Diogo FT, et al. "A comprehensive long-read isoform analysis platform and sequencing resource for breast cancer." Science Advances 8.3 (2022): eabg6711.