Question

What is a better strategy for finding novel transcripts?

0

Entering edit mode

5.4 years ago

jsw940 ▴ 10

Hi, I've started learning RNA-seq only recently. I use nanopore technology(cDNA sequencing) for finding novel transcripts. But I don't know exactly which analyze tools are suit for this. I read several papers about this but I couldn't fully understand.

For example, I wanted to visualize my sequencing data with IGV. But IGV only take mapped sequencing data, which exclude novel transcripts(as I know...). And when I use Gffcompare, I cannot extend my analyze with data categorized as "u".

These several lack of my knowledge makes me confused. So, is there patterned pipelines for finding novel transcripts using informatics before doing actual verification such as RT-qPCR, cloning, and so on...?

And, thank you for all of you! every question and answer was very helpful for my studying.

RNA-Seq sequencing • 2.1k views

ADD COMMENT • link updated 5.4 years ago by Amitm ★ 2.3k • written 5.4 years ago by jsw940 ▴ 10

0

Entering edit mode

Do you have a reference genome available? If so, reads from novel transcripts should still be mapped.

ADD REPLY • link 5.4 years ago by GenoMax 151k

0

Entering edit mode

Yes, I used minimap2 with GRCh38.p13.genome. Honestly, I have difficulty to choose reference genome and annotation. How can I choose genome version compatible with my analyze plan? Until now, I followed genome version used in papers.

ADD REPLY • link 5.4 years ago by jsw940 ▴ 10

score 5 · Accepted Answer · 2020-01-20

Hi, I assume this is an organism for whom reference genome/ annotation are available. I have experience with human/ mouse using Illumina platform. For a reference genome known situation, the first scenario would be to do transcriptome assembly after alignment. So using STAR alignment and then using StringTie. You would have a GTF as one of the result files which would have the known as well as novel transcripts assembled (read manual for the appropriate parameters to use). Once you are at this stage, then load your STAR aligned BAM file onto IGV and the StringTie GTF as well. Choose any gene locus and then you could check the transcripts assembled by StringTie. The GTF gets loaded beneath the 'Gene' track in IGV. The 'Gene' track (set in blue colour in image) would have the known isoforms and your GTF (from StringTie) (set in pink and green colour in image; 2 samples) should show the known and any novel transcripts (Tx) assembled. IGV link - https://ibb.co/GC2qTFc This strategy works well if you are suspecting a couple of loci. If you want to do transcriptome-wide comparison where 'novel' transcripts are also considered, then JunctionSeq is another approach. This is not assembly-based, but rather uses (junction) counts but evaluates all novel uses of exons in known transcripts. As far as I am aware, completely novel transcripts are beyond this tool's scope, but the advantage here is statistical rigour for transcriptome-wide comparison, especially if you have a bunch of 'test' vs. 'control' samples.

Finally, you could try de novo Tx assembly, if reference annotation is not available, or you suspect something really wild is going on. Time and down-stream interpretation are limiting factors here.