Are these 5'RNA-Seq datasets? I would do them both ways.
Also, try ranking each TSS locus with a parameter (eg how much enichment you can found and from which rep), then generate a high, medium and low confidence list of TSS and then cross-compare. You can also use additional scores like presence of conserved TATA boxes, CpG island and GC strength.
Introduction to Transcriptional Initiation at Metazoan Promoters
To understand the analysis of 5'RNA data, it is worth taking a moment highlight that there are multiple 'types' of promoters in living organisms. First of all, there are different RNA polymerases including RNA polymerase I (rRNA), II (mRNA, lncRNA, miRNA), III (tRNA), IV(plant specific), viral polymerases, etc., and each polymerase has different mechanisms of transcriptional initiation that may vary between different distally related organisms. Also be aware that different RNA polymerases may generate RNAs with different covalent modifications and may or may not be present in your5' RNA sequencing, depending on how the experiment was performed. By in large most researchers are interested in RNA polymerase II transcripts (mRNA) and as a result most 5'RNA methods focus on the identification of
RNAs containing a 7-methylguanosine cap protecting their 5' end.
With respect to RNA polymerase II initiation sites, there are two generally recognized 'types' of TSS. Sharp (or Focused) TSS initiate transcription from a single nucleotide (or +/- 2 nt) and resemble the promoters found in molecular biology text books. They often contain well define core-promoter elements such as the TATA box and usually initiate transcription from a purine preceded by a pyrimidine (PyPu, i.e. CA, with the A being the initiating nucleotide).
The other, more common TSS is a broad (or dispersed) TSS. These promoters initiate transcription from sevearl different sites within a large area (often 50-100 nt in size). These promoters usually lack core promoter elements (no TATA box), but they each individual initiation site DOES normally still initiate on a purine preceded by a pyrimidine (PyPu).
False TSS - be careful of artifacts
A quick note about artifacts in 5'RNA-Seq data: Most 5' RNA-Seq methodologies work by enriching for 5' cap-protected RNA, which means that most of the sequence data describes 5' RNA ends, but a fraction of it may be noise from random RNA-Seq fragments (again, a lot like ChIP-Seq). In particular, highly expressed RNAs may yield "5'RNA-Seq" reads along the whole body of the gene giving the appearance of alternative TSS which are likely false positives. Because of this, I would highly recommend using traditional RNA-Seq as a "background" when analyzing 5' RNA-Seq data. This approach (describe below) may remove several real TSS from the results, but it is also likely to remove a large number of false positives and clean up your analysis.
Transcplicing of transcripts (where the 5' end of one transcript is added to the front of another) and recapping (where a transcript is cleaved and a new cap placed on the truncated product) are two phenomena you may want to think carefully about when analysing 5' RNA-Seq data. Transplicing will create false negatives and recapping will create false-positives. In certain organisms, such as C. elegans, transcplicing is very common, making 5'GRO-Seq a much better assay for identifying TSS than 5'RNA-Seq (i.e. measuring the 5' RNA ends before they have a chance to transplice). In other organisms (e.g. mouse, human, fly, etc.) it appears to be rare. The degree to which transcription are 'recapped' is a matter of debate because it can be hard to distinguish them from true alternative TSS or noise in the 5' RNA-seq assay.
Thank you for the information, it's very helpful. My data isn't 5', it's random primed as the intention was primarily for expression analysis and this was a supplementary idea for improving upon the public reference for our dataset, so maybe I'll have less success with it but I'll give it a go.
Good Luck!!