Question

Tissue-dependent usage of exons

0

Entering edit mode

2.2 years ago

mariannapauletto ▴ 90

Dear Biostars community,

I'm planning a RNA-sequencing experiment to detect tissue-dependent usage of exons (i.e. alternative splicing) in an insect species having a reference genome. I've isolated RNA from 4 different pools of each target tissue.

Now, I have to decide which is the most suitable sequencing strategy.

Looking at the bionformatic software available, it seem that even short reads with a low coverage can be processed to obtain alternative splicing events. But, theoretically, having longer reads and high coverage should be better, especially if using softwares which de novo reconstruct transcripts isoforms.

So, based on the software available and their performances, do you you think that 100PE (50M reads per sample, 200M if considering that I have 4 different samples per tissue) is enough? Or is it better to sequence longer reads?

Someone who has experience in the analysis of alternative splicing events can provide some suggestions, please?

Thank you in advance to who will be of help.

Best

Marianna

alternative splicing exons usage • 1.5k views

ADD COMMENT • link 2.2 years ago by mariannapauletto ▴ 90

score 0 · Answer 1 · 2023-03-13

0

Entering edit mode

2.2 years ago

i.sudbery 21k

The depth of reads you need is going to depend on the size and complexity of the transcriptome, but I would have thought that 100PE with 50M pairs per sample would generally be considered usable for human, so I expect it will be usable a range of eukaryotic genomes.

ADD COMMENT • link 2.2 years ago by i.sudbery 21k

0

Entering edit mode

Thank you!

The genome size is low, 225 Mb. So I think 50M reads seems good enough.

ADD REPLY • link 2.2 years ago by mariannapauletto ▴ 90

score 0 · Answer 2 · 2023-03-13

0

Entering edit mode

2.2 years ago

dthorbur ★ 3.0k

I prefer long read as a more definitive method since you can span full transcripts. Nanopore cDNA sequencing permits multiplexing (up to 24 barcodes I believe per flowcell) and as a result suffers from the a lot of the same biases that IlluminaPE has when poly-A capture is used during library construction. Alternatively, if you have the money, PacBio IsoSeq is great, or using nanopore direct-RNA but you'd have to wash flowcells mid run to multiplex since no barcoding is currently available.

Transcriptome-wide de novo isoform assembly tends to be a bit messy at the best of times, but you can use multiple tools and take a consensus among the two. I believe GFFcompare does this. I'm currently building a pipeline using nanopore cDNA data that uses IsoQuant and ESPRESSO to make a consensus call.

That said, your outline of the short read method would likely be sufficient for a normal eukaryotic transcriptome if you stick with short reads.

ADD COMMENT • link 2.2 years ago by dthorbur ★ 3.0k

0

Entering edit mode

So basically, you're saying that Illumina and cDNA Nanopore have the same biases?

So the real improvement would be PacBio IsoSeq or nanopore direct-RNA. As usual, it's related to money!

Marianna

ADD REPLY • link 2.2 years ago by mariannapauletto ▴ 90

0

Entering edit mode

Not quite. I am saying Nanopore cDNA and "standard" Illumina RNA library construction (i.e., poly-A capture and ribo-depletion) suffer from a lot of the same biases, but I am also suggesting that Nanopore has a lot of benefits over Illumina for novel isoform experiments.

I am unsure how PacBio libraries are created, and if poly-A enrichment is typically used.

ADD REPLY • link 2.2 years ago by dthorbur ★ 3.0k

0

Entering edit mode

For isoform discovery, absolutely long reads are better. But for quantification, you run into a trade off with having enough reads to draw statistically powerful conclusions.

My own feeling is that Illumina should be fine for calculating differential use of exons (DEU), as, strictly speaking DEU is independent of the rest of the transcript. Differential usage of individual exons is what the OP specified. The same goes for any event based analysis.

For differential transcript usage, obviously having the full length transcript in a single read is helpful, but the cost of getting sufficient read depth for sufficient replicates from something like IsoSeq would seem prohibitive.

BTW, which are the bias' that Illumina and Nanopore cDNA share?

ADD REPLY • link 2.2 years ago by i.sudbery 21k

0

Entering edit mode

I guess I read "alternative splicing" in the original post and immediately thought of isoform assembly and ran with it. I agree that Illumina would be sufficient for analysing DEU and that the quantification from nanopore isn't nearly as good.

Ideally, for transcript usage analyses you would mix the two techniques to quantify expression in novel isoforms identified from long read data.

The biases I mention are those relating to PCR amplification, reverse transcription and the loss of non-poly-A RNAs.

ADD REPLY • link 2.2 years ago by dthorbur ★ 3.0k

0

Entering edit mode

Right. Probably if I were doing splicing analysis on a non-model organism, I'd want to do both IsoSeq and illumina, if I had enough money!

To the best of my knowledge, all cDNA based techiniques suffer from RT and PCR bias. I don't think PacBio is any different. The only difference I guess would be direct RNA nanopore, but that comes with its own issues.

ADD REPLY • link 2.2 years ago by i.sudbery 21k

0

Entering edit mode

Definetly, IsoSeq and Illumina seems the best combination to have both alternative splicing events and their differential expression. IseSeq prices are decreasing but still quite high.

But for DEU only, looking at literature and your comments, Illumina with a good coverage should be enough.

ADD REPLY • link 2.2 years ago by mariannapauletto ▴ 90