Dear All,
It may be a trivial question but what would be a best way to know if resequencing of a transcriptomic library at a higher depth will generate extra results. Let's assume I have a library that was sequenced at a depth of 5 million reads. The number of alignment with non-unique start sites (PCR duplicates) is around ~40%. Now I want to know if resequencing the same library at a depth of 20 million reads will add new results to the already existing ones that I generated from the run containing 5 million reads. I wish to know is it worth to pay an extra money if it doesn't add any new information in the results. I can perform the following comparative analyses after running the same library at a depth of 10 million reads. The analyses would compare the following results from the two runs:
1) Compare the number of expressed genes (>10 RPKM) in sample with 5 million reads and 10 million reads. If I find a substantial increase in the number of expressed genes, then running the library at a higher depth will make sense. Similarly, I can also look at the number of deferentially expressed genes between condition 1 and 2 in Sample with 5 million reads and Sample with 10 million reads.
2) Similar analysis as above but for spliced junctions. If I can find substantial increase in number of reads aligning on exon-exon junctions that may be useful.
3) I can combine the two runs and check if the rate of PCR duplicates stays the same (~40%) and doesn't shoot up dramatically, then I may be adding newer reads.
Feel free to comment or add your suggestions. Also, if there are some good reviews about the same somewhere, please post them here.
Just want to add that we are interested in splice junction discovery too.