Question

How to know protomter position in the chromosome?

0

Entering edit mode

6 months ago

sunyeping ▴ 110

0 Dear all,

Could you tell me how to find the promoter position of a gene in the chromosome? Recently I did ATAC-seq expriment and got the result, showing that the peaks related to some genes is assigned to "promoter". The result gives the chromosome position of the peak but I am curious about whether the promoters of these genes really overlap the peak position. For example, a peak in chromosome 1 (ref genome: mm39; position: 149975557~149975957) is assigned to the gene "Ptgs2", and the annotation is "Promoter (<=1kb)". So where can I get the precise coordination of the promoter for "Ptgs2"? I tried UCSC, Ensembl, or NCBI's Genome Data Viewer but cannot find the promoter information of the gene.

Thank you!

ATAC-seq • 671 views

ADD COMMENT • link updated 6 months ago by rfran010 ★ 1.3k • written 6 months ago by sunyeping ▴ 110

0

Entering edit mode

The TSS of every transcript is simply the first base of the transcript, and the promoter is upstream. Hence, 1kb upstream of promoter is first bp of transcript - 1000bp.

ADD REPLY • link 6 months ago by ATpoint 86k

0

Entering edit mode

Does the promoter always begin at the -1000 bp of a transcript? Then which position does it end?

ADD REPLY • link 6 months ago by sunyeping ▴ 110

0

Entering edit mode

Hi, the transcription start site (TSS) is usually given the coordinate 0 and in practice usually matches the start of mRNA/transcript features in GFF files. The TSS is usually quite close and downstream from the TATA box (see for instance https://www.science.org/doi/10.1126/science.adj0116).

The promoter is upstream from the TSS and has negative coordinates such as [-1000,0], but its length is usually not know, so it could be shorter (ie [-500,0]) or longer (ie [-5000,0]. Hope this helps.

ADD REPLY • link 6 months ago by b.contreras.moreira ▴ 350

0

Entering edit mode

The end is the start of the transcript, and the start is often not known as already mentioned by @b.contreras.moreira. The -1000 or -500 are typical proxies, and depending on your goal "good enough".

ADD REPLY • link 6 months ago by ATpoint 86k

0

Entering edit mode

For some more context, "promoter" is not something strongly encoded in the genome. Conversely, other annotations have stricter definititions, e.g. the TSS has a stronger definition determined by the 5' end of RNAs (although can potentially vary from reference and between cell types), and CDS regions are strongly defined by START-END codon regions.

Promoter is something with a much more fluid definition. So for example, for one gene, it may be on the TSS, while another maybe 1kb upstream.

So there is no precisely defined coordinate for the Ptgs2 promoter. Instead, ATAC-seq data is generally used to actually define the promoter. I think there's also some methods that can more explicity define the promoter since ATAC-seq gives non-promoter regions as well.

In reality the true definition of a promoter is where RNA-pol can bind and initiate transcription. The rules governing this process are incompletely known, so we cannot accurately predict the exact promoter (for example if you have two ATAC-seq peaks within 1kb of the TSS, are they both promoters?). However, there's evidence that promoters are strongly depleted of nucleosomes, allowing RNA-pol and TFs to bind and initiate transcription. This nucleosome-depleted region appears as an ATAC-seq peak.

We have observed promoters are generally close to the TSS, so most annotation programs will just assign the ATAC-peaks within a certain distance to the TSS as 'promoter'. This is imperfect though.

A technical note about this, we mention a lot the upstream region being where promoters are found (mostly true, but they can be downstream). However, depending on the tool, the annotation may be defining promoters as within 1kb regardless of up/downstream (for example ChIPseeker default is anything within 3kb up or downstream). You can check the documentation, and potentially explicitly define to only assign upstream ATAC-peaks as promoters (Personally, I like HOMER's definition (-1kb to +100bp)

If you want to more strongly define the promoter region you can cross-reference your ATAC-seq data with ChIP-seq data if available (preferably with TFs known to bind promoters) and also to other published ATAC-seq data. However, if you see the ATAC-seq 'promoter' peak shifts between cell types, this very well may represent differential promoter usage which could be of interest!

ADD REPLY • link 6 months ago by rfran010 ★ 1.3k