Hello,
I am trying to map the TSS of genes in the whole genome. I did look for what people were proposing already but the last posts on this subject are quite old. I was wondering if the current view would be different. Examples of what I have seen so far are:
- mapping to GencodeV19/Ensembl release 74 (as in http://www.cell.com/action/showMethods?pii=S0092-8674%2814%2901178-7)
- mapping to RefSeq coding gene isoforms with a unique TSS (total of 23,430 isoforms) (as in https://www.nature.com/articles/nature14136) -Using the ucsc data (as in https://www.biostars.org/p/94292/) As far as I understand it, all this take as a TSS the very first base of the transcripts for all genes. Is that right?
But what about CAGE data from FANTOM5? Doesn't it provide TSS as well by a more data driven method? There is a lot of complexe data in FANTOM5. I never used it. Is there some pitfall I should avoid please? How do I get a nice file with measured TSS per transcripts ? (is it possible?)
What would be the best method according to you? Given that of course my experiment has been done in a specific cell... Should I go as far as getting TSS for that cell type? How would I do this?
A lot of questions sorry! I am starting on this. But I thought it might be useful to others than me given all the new CRISPRa/i screens coming up.
Many thanks Aurelie
You might be also interested in EPD, which is curated promoter database. http://epd.vital-it.ch/index.php
Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531148/