Question

Optimum number of reads for RNA-seq experiment

0

Entering edit mode

3.7 years ago

Priyanka ▴ 10

I am trying to perform human RNA-seq to understand the transcriptome profile for an infectious disease. Along with this we wish to explore alternate splicing in the same data. How many reads are good enough to perform such studies. I have seen the recommended is around 60 or more million reads per sample to explore alternate splicing in the data. Is such sequencing depth essential? After r RNA depletion has been performed then is 20-30 million reads is good enough?

rna-seq • 1.5k views

ADD COMMENT • link 3.7 years ago by Priyanka ▴ 10

score 0 · Answer 1 · 2022-03-16

0

Entering edit mode

3.7 years ago

Istvan Albert 103k

Note that rRNA is closer to 80% in abundance. Thus not depleting for rRNA would lead to most of your data matching rRNA .

In general, no one can tell for sure what the optimal number of RNA-Seq depth is, there are many factors that can affect the interpretation of the data.

In my opinion, if your library preparation is good enough and the replication is reliable then 20 million reads are usually sufficient to establish a profile. The rule is that replication is more important than depth.

Three replicates at 20 million depth each are expected to be more informative than two replicates at 30 million depth each.

ADD COMMENT • link 3.7 years ago by Istvan Albert 103k

0

Entering edit mode

Thank you for your reply. I understand that experiment wise the read counts can vary and optimum is really a relative term.

But will you say 20 million reads is good enough to study alternate splicing in the human genome (assuming ~30-50 % of those 20 million may belong to microbiota of the host).

I found a study ( cell line study ) using 60 million reads mapping to human genome to study alternate splicing. But I have patient sample for an infectious disease, thus a major chunk will also belong to the pathogen and microbiota.

I am trying to find the balance between addition benefit of more reads vs sampling cost.

ADD REPLY • link 3.7 years ago by Priyanka ▴ 10

0

Entering edit mode

But I have patient sample for an infectious disease, thus a major chunk will also belong to the pathogen and microbiota.

Why would that be the case? How was the sample collected and what kind (blood/tissue) was it? What method was used to make the libraries? Only way I can imagine this is if you were using blood which contained a pathogen (what kind of pathogen is it, eukaryotic or prokaryotic?).

ADD REPLY • link 3.7 years ago by GenoMax 154k

0

Entering edit mode

It's a nasal swab data and do we expect to find microbiota to be present along with viral pathogen.

ADD REPLY • link 3.7 years ago by Priyanka ▴ 10

0

Entering edit mode

of course, it matters immensely what you are studying, how the samples are extracted,

The essential factor is always the relative abundance of a transcript relative to other transcripts that may be present and (the sum of them all). In general, the host genome is usually much larger thus normally dominates the sample unless managed in some wat.

If you are interested in the response of the host, that is a completely different story than studying the transcriptomic changes of the pathogen.

The best way to get ahead of any issue is to run a pilot study before embarking on a large multisample, multi-replicate study. Running the pilot study will give you the right kind of insight.

Another avenue would be to search the literature and the Short Read Archive for studies similar to yours, download their data then see for yourself how well it works.

ADD REPLY • link 3.7 years ago by Istvan Albert 103k

0

Entering edit mode

Right. Thanks for the suggestion. I'll try to search SRA for similar data and see how my pipeline of analysis works on them to get a better idea on reads/per/sample optimum for my study design.

ADD REPLY • link 3.7 years ago by Priyanka ▴ 10