I am trying to perform human RNA-seq to understand the transcriptome profile for an infectious disease. Along with this we wish to explore alternate splicing in the same data. How many reads are good enough to perform such studies. I have seen the recommended is around 60 or more million reads per sample to explore alternate splicing in the data. Is such sequencing depth essential? After r RNA depletion has been performed then is 20-30 million reads is good enough?
Thank you for your reply. I understand that experiment wise the read counts can vary and optimum is really a relative term.
But will you say 20 million reads is good enough to study alternate splicing in the human genome (assuming ~30-50 % of those 20 million may belong to microbiota of the host).
I found a study ( cell line study ) using 60 million reads mapping to human genome to study alternate splicing. But I have patient sample for an infectious disease, thus a major chunk will also belong to the pathogen and microbiota.
I am trying to find the balance between addition benefit of more reads vs sampling cost.
Why would that be the case? How was the sample collected and what kind (blood/tissue) was it? What method was used to make the libraries? Only way I can imagine this is if you were using blood which contained a pathogen (what kind of pathogen is it, eukaryotic or prokaryotic?).
It's a nasal swab data and do we expect to find microbiota to be present along with viral pathogen.
of course, it matters immensely what you are studying, how the samples are extracted,
The essential factor is always the relative abundance of a transcript relative to other transcripts that may be present and (the sum of them all). In general, the host genome is usually much larger thus normally dominates the sample unless managed in some wat.
If you are interested in the response of the host, that is a completely different story than studying the transcriptomic changes of the pathogen.
The best way to get ahead of any issue is to run a pilot study before embarking on a large multisample, multi-replicate study. Running the pilot study will give you the right kind of insight.
Another avenue would be to search the literature and the Short Read Archive for studies similar to yours, download their data then see for yourself how well it works.
Right. Thanks for the suggestion. I'll try to search SRA for similar data and see how my pipeline of analysis works on them to get a better idea on reads/per/sample optimum for my study design.