I'm looking to perform RNA-seq on pairs of cell lines to look at differential gene expression and wondering what I can get away with to lower costs. 2 questions:
1) How many reads would be a good starting point? Would 20M be a reasonable place to start? 2) I understand I should have at least 2 biological replicates per cell line. Is it ok to combine the tubes into 1? Or should they be run separately? 3) RIbo-depletion costs $200 more than poly-A enrichment. Is it better to go with ribodepletion since it encompasses non coding RNA or is it common to try polyA enrichment first and then go with noncoding RNA if nothing is found with polyA?
Thanks in advance!
Regarding seq. depth, keep in mind 2 things. 1. If you will get bad quality sequencing you will have to do trimming, and depending on the read quality you might trim a substantial amount of data, thus decreasing your seq. depth. 2. For DE analysis what we are interested in are unique alignments to reference genome, as a rule we don't use mulitmapped reads (there are some ways to include also multimaps, using featurecounts for example, but reviewers might not be happy in the end). So usually unique mapping rate should be around 90% if everything is fine. Thus, if you sequence 20mln reads, in a good scenario 85-90% of these data is usable for DE analysis. So always try to reach a bit more to compensate for that 2 factors.