What is exactly meant by "GENCODE TSS"?
0
0
Entering edit mode
20 months ago

I have seen many many papers mentioning "GENCODE TSS"

However, upon looking at the GENCODE GTF file downloaded from the GENCODE website (e.g. gencode.vXX.annotation.gtf.gz), I didn't see any obvious "TSS" entry.

So, how does one goes about defining "GENCODE TSS? What does this statement EVEN MEAN??

My theory: So, within the GENCODE GTF file, I noticed that each (protein-coding) gene has multiple "transcript", Am I right in saying that the start/end coordinate (for + and - strand respectively) of each transcript of a gene would be the TSSs of that gene?

So for example, gene A (+ strand) have 3 transcripts, then Am I right in saying that the START coordinate of each of this transcript represent the 3 TSSs of gene A?

HOWEVER, How do you differentiate the case where for an alternate transcript of gene A, the first exon is NOT the first transcribed exon (due to splicing).

In this case wouldn’t it be wrong to define the start site of that exon as the TSS? (The real TSS should be attached to the spliced out exon instead).

What do you guys think of this?

GENCODE transcription annotation TSS gene • 1.5k views
ADD COMMENT
1
Entering edit mode

How do you differentiate the case where for an alternate transcript of gene A, the first exon is NOT the first transcribed exon (due to splicing).

uhhh ? aren't you mixing the initiation of transcription and the initiation of translation here ?

ADD REPLY
0
Entering edit mode

I dont think im confusing the two, is what I meant:

I originally said that the “start” coordinate of every entry termed “transcript” under each gene would be the TSSs for that gene… (so a gene would have 3 TSSs if that gene have 3 alternate transcripts)

But is this really the case everytime? Like I said above, what if a particular alternate transcript of a gene exists as a result of first-exon skipping? As I understood it GENCODE will not annotate the skipped exon as part of this transcript, so then it would be wrong to define the start site of this transcript as independent TSS of that gene

(e.g transcript B can still be biologically transcribed from the same promoter of transcript A and thus have the same TSS, but since the first exon of transcript B is skipped…. in GENCODE gtf file it will look like as if the TSS of transcript B start downstream of transcript A and I’ll define 2 TSSs for this gene, although biologically they may have the same TSS afterall)

Does this make sense?

ADD REPLY
0
Entering edit mode

checkout: chipseeker package

there is a function which gets you the tss

getPromoters(TxDb=txdb, upstream=3000, downstream=3000)

as you can see it reaches out to UCSC genome annotation. There are public data sets that have been deposited and used to find genomic elements such as TSS. I guess the same applies to Encode

ADD REPLY

Login before adding your answer.

Traffic: 1076 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6