How To Check For The Saturation Of The Library ?
1
1
Entering edit mode
10.7 years ago

Dear All,

It may be a trivial question but what would be a best way to know if resequencing of a transcriptomic library at a higher depth will generate extra results. Let's assume I have a library that was sequenced at a depth of 5 million reads. The number of alignment with non-unique start sites (PCR duplicates) is around ~40%. Now I want to know if resequencing the same library at a depth of 20 million reads will add new results to the already existing ones that I generated from the run containing 5 million reads. I wish to know is it worth to pay an extra money if it doesn't add any new information in the results. I can perform the following comparative analyses after running the same library at a depth of 10 million reads. The analyses would compare the following results from the two runs:

1) Compare the number of expressed genes (>10 RPKM) in sample with 5 million reads and 10 million reads. If I find a substantial increase in the number of expressed genes, then running the library at a higher depth will make sense. Similarly, I can also look at the number of deferentially expressed genes between condition 1 and 2 in Sample with 5 million reads and Sample with 10 million reads.

2) Similar analysis as above but for spliced junctions. If I can find substantial increase in number of reads aligning on exon-exon junctions that may be useful.

3) I can combine the two runs and check if the rate of PCR duplicates stays the same (~40%) and doesn't shoot up dramatically, then I may be adding newer reads.

Feel free to comment or add your suggestions. Also, if there are some good reviews about the same somewhere, please post them here.

rna-seq library • 3.2k views
ADD COMMENT
0
Entering edit mode

Just want to add that we are interested in splice junction discovery too.

ADD REPLY
2
Entering edit mode
10.7 years ago

I think PCR duplicates are hard to deal with in RNA-Seq data, but I would say you generally want 10 million reads. After that, I think replicates are more important than coverage.

You can see some more detailed statistics in this article:

http://www.ncbi.nlm.nih.gov/pubmed/24319002

ADD COMMENT
0
Entering edit mode

Thanks for the paper.

ADD REPLY
0
Entering edit mode

Sure, no problem.

Splice junction discovery will be a bit of another story. Unlike gene expression (which I think is OK for single-end), you'll want paired-end (and/or longer read) data and higher coverage (perhaps starting with 20 million reads? not as sure in that case).

ADD REPLY
0
Entering edit mode

I have heard that paired end is better for splice junction discovery. Is it only because the paired-end reads can be mapped more confidently than the single end read? Because single-end reads can be soft clipped by the aligner too.

ADD REPLY
0
Entering edit mode

In practice, I know that MATS could provide splicing events for the same sample when processed with a paired-end library but it shouldn't call any events for that same sample with a single-end library.

I think it is an issue with being able to confidently identifying the mapping for fragments of a 100 bp read. It might have been a different story if I had access to 300 bp reads. So, I think the short answer is "yes".

ADD REPLY
0
Entering edit mode

thanks. it would be great if you know a reference paper or if you come across a reference paper that talks about inefficiency of single end read to detect splice splice junctions, please let me know. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6