Question

RNAseq coverage vs depth for transcript isoform expression?

0

Entering edit mode

7 months ago

marineandriot • 0

Hi everyone,

I am a graduate student hoping to find the best sequencing option for my project. I am looking at transcript isoform expression in my study species under different stressors. I currently have long-read PacBio data to identify the isoforms in question, and I am trying to sequence corresponding Illumina data to quantify isoform expression or usage.

I have two options for sequencing 5-6 replicates of 4 samples (20-24 total replicates) with nearly identical pricing, but I could use some advice choosing the best option.

1) paired end 100bp with 50M(6 replicates) - 60M (5 replicates) reads per replicate with a total of 1.2B reads 2) paired end 150bp with 40M (5 replicates) reads per sample for a total of 800M reads

What are the tradeoffs between the depth sequencing and read length here? Which is the better option for transcript isoform expression?

Thank you so much for your time!

RNAseq • 452 views

ADD COMMENT • link updated 7 months ago by Gordon Smyth ★ 7.7k • written 7 months ago by marineandriot • 0

score 2 · Answer 1 · 2024-05-18

See the exploration of isoform estimation precision vs read length and sequencing depth in our paper: https://doi.org/10.1093/nar/gkad1167. We show that isoform overdispersion can be directly interpreted as reducing the effective sequencing depth.

Table 1 of our paper shows that the increase in paired-end read length from 100bp to 150bp has only a minor effect on estimation precision, and has the same effect as increasing the sequencing depth by 1.6%. So you are much better off with the first option of 100bp and more reads.

Having more replicates is even more important than read length or depth, so the first option of 100bp with 50 million reads and 6 replicates is far the best.