Question

Estimating transcript integrity after cDNA sequencing for long read data (Nanopore)

1

Entering edit mode

11 weeks ago

Rox ★ 1.4k

Hello everyone !

I am trying to figure out a way to estimate how long the transcripts we are getting from a cDNA experiment with nanopore are.

For Illumina data, we have been using RSeQC : tin.py module : https://academic.oup.com/bioinformatics/article/28/16/2184/325191 .

Naively, we have tried to run this tool for our test samples, sequenced on both Illumina and Nanopore.

The result we got are unexpected and puzzling. For a given subset of transcripts of interests (not to run on the whole transcriptome), we are in general getting lower TIN values for Nanopore sequencing than for Illumina sequencing.

I can't really wrap my head around it. I am questioning wether this tool is appropriate for long read or not. Hence my post here to gather some opinions !

Any idea why this difference is observed ? Are their others tools that could do what we are looking for ?

Best,

Roxane

cDNA nanopore RSeQC • 728 views

ADD COMMENT • link updated 10 weeks ago by GenoMax 147k • written 11 weeks ago by Rox ★ 1.4k

0

Entering edit mode

What kind of aligner did you use? Also, how good is the reference gene model?

ADD REPLY • link 11 weeks ago by andres.firrincieli 3.8k

0

Entering edit mode

I used dorado aligner. And the reference is Hg38

ADD REPLY • link 11 weeks ago by Rox ★ 1.4k

0

Entering edit mode

dorado internally uses minimap2 so you should have got the best results already .. more or less. You could manually run minimap2 with

- splice/splice:hq - long-read/Pacbio-CCS spliced alignment

and confirm.

Was this a direct RNA sequencing or cDNA sequencing? What do you mean by "integrity"? Are you thinking that the sequence you got from nanopore does not reflect reality?

ADD REPLY • link 10 weeks ago by GenoMax 147k

score 1 · Answer 1 · 2024-09-05

A former student of mine has worked on a tool called TIN check, that can estimate the uniformness of transcript coverage. The idea for the tool comes from RSeQC

https://github.com/aswathyseb/tincheck

The tool is not yet published and may never be published as a stand-alone tool, but I have used the resulting values to filter RNA-seq data in experiments where, for some reason, specific transcripts had wildly non-uniform coverage.

We report both the observed and expected TIN numbers, which is quite helpful when the coverages are low.

The tool was designed for short reads, but it should also work for long reads.