If I'm about to send something off to sequencing, and my main question is about splicing, am I better off with 75 bp pair-end reads or 150 bp single end?
I understand that paired-end is better than single end, and longer is better than shorter, but if I'm weighting between the two, which should I prioritze for splicing detection, longer reads or paired end?
Hi,
Assuming the sequencing template library is ~300bp (median), then a 75x2 PE data is effectively giving you information of 75 + 75 + the insert size of ~150 = ~300bp of the genome/transcriptome.
Admittedly, in practice, the avg. insert size you get is mostly << 150bp as the template lib. size selection does give you many << 300bp templates.
Having said that, as far as I undstnd, a 75X2 PE should give more "information" than a 150bp SE.
Also, if the organism in concern has repetitive seq. content in the transcriptome, it helps I think to have a larger "info" content in data like a 75X2 PE or a 100X2 PE.
I am not sure if a longer read size of 150bp as a SE alone would be more effective in tackling repetitive content at mapping step.
Definitely paired-end. There is no advantage of SE over PE and PE ensures exact knowledge of the position of the fragment ends in the genome/transcriptome. Also, some tools such as salmon can now tackle GC bias with PE information, though this is mainly for DEG analysis, not sure if one can/should use this for splicing analysis, not my field.
This answers my question perfectly. Thank you!