I am working on human genome. It is always recommended to choose sequencing depth based on the experimentation purpose . In my project, I need to estimate isoform expression (Ψ values, for “Percent Spliced In” or “Percent Spliced Isoform”) and to identify the splicing events between multiple samples. So, RNA-Seq depth needed for isoform calculation. What should be minimum reads required for splicing analysis from RNASeq data ??
As you might have figured out this is also depending on the kind of isoform you wish to analyse. I mean: if they are very rare splice variants you will need to sequencer deeper. Reversing this reasoning it means that a moderated depth will allow to pick up quite 'common' isoforms, and thus the deeper you sequence the more rare isoforms you will potentially pick up.
I assume you are talking illumina (short) reads here. If so, it might be good to also have a look at the long read technologies (PacBio, ONT) , as for true correct full length isoforms you will only have strong evidence if the isoform is derived (or obtained) via long reads (as this will give you a better view on the full transcripts, rather than to assemble them (= potentially many false positives in there)
I am talking about short reads from RNA-Seq data. As there are different ways to perform the splicing: reference based and denovo based. I managed to get paired end RNASeq data with 44.3 million reads (QC filtered) per sample. Can I use it for discovery of common 'isoforms'?
Including the usage of long read technologies (PacBio, ONT) will surely increase power to pick up correct isoform.
yes, if talking short reads than the paired-end ones are the most usable indeed. Going from that number you might even get more than only the common ones.
Personally I would start with the reference based approaches (given that there is a genome present for this species) and only in a second phase go for the de-novo route, as this one will give you a much more noisy view on things.
Thanks lieven for quick response.
I am talking about short reads from RNA-Seq data. As there are different ways to perform the splicing: reference based and denovo based. I managed to get paired end RNASeq data with 44.3 million reads (QC filtered) per sample. Can I use it for discovery of common 'isoforms'?
Including the usage of long read technologies (PacBio, ONT) will surely increase power to pick up correct isoform.
A
yes, if talking short reads than the paired-end ones are the most usable indeed. Going from that number you might even get more than only the common ones.
Personally I would start with the reference based approaches (given that there is a genome present for this species) and only in a second phase go for the de-novo route, as this one will give you a much more noisy view on things.