Question

Transcription Start Site

0

Entering edit mode

2.7 years ago

Fatemeh Nabizadeh ▴ 10

What are the best databases to check out the transcription start sites of specific genes in human genome?

TSS • 1.8k views

ADD COMMENT • link updated 9 months ago by Ming Tommy Tang ★ 4.5k • written 2.7 years ago by Fatemeh Nabizadeh ▴ 10

2

Entering edit mode

You can find TSS for all transcripts of a given gene by querying Biomart

ADD REPLY • link 2.7 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Seems that DBTSS doesn't work!

ADD REPLY • link 2.7 years ago by Fatemeh Nabizadeh ▴ 10

0

Entering edit mode

you can use bioconductor as shown in this post using Genomicanges https://support.bioconductor.org/p/46508/

ADD REPLY • link 9 months ago by Ming Tommy Tang ★ 4.5k

score 1 · Answer 1 · 2022-04-25

 wget -q  -O - "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz" | gunzip -c  | awk '(int($7)< int($8)) {if($4=="+") {printf("%s\t%d\t%d\t%s\t%s\n",$3,$7,int($7)+1,$2,$4);}else {printf("%s\t%d\t%d\t%s\t%s\n",$3,int($8)-3,$8,$2,$4);}}' 


chr1    69090   69091   ENST00000335137.3   +
chr1    139306  139309  ENST00000423372.3   -
chr1    367658  367659  ENST00000426406.1   +
chr1    622031  622034  ENST00000332831.2   -
chr1    739134  739137  ENST00000599533.1   -
chr1    818042  818043  ENST00000594233.1   +
chr1    861321  861322  ENST00000342066.3   +
chr1    866442  866445  ENST00000598827.1   -
chr1    894617  894620  ENST00000327044.6   -
chr1    896073  896074  ENST00000338591.3   +

score 0 · Answer 2 · 2022-04-26

0

Entering edit mode

2.7 years ago

ATpoint 86k

Basically any GTF file, from RefSeq, Ensembl, GENCODE. It is the start coordinate of the entries with type transcript. Be aware that for genes on the bottom strand it would be the end coordinate, but most GTFs even have a TSS entry that you can use directly.

ADD COMMENT • link 2.7 years ago by ATpoint 86k

score 0 · Answer 3 · 2024-03-18

Here is a simple pythonic way to use biomart:

import pybiomart as pbm
dataset = pbm.Dataset(name='hsapiens_gene_ensembl',  host="http://sep2019.archive.ensembl.org/")
annot = dataset.query(attributes=['chromosome_name', 'transcription_start_site', 'strand', 'external_gene_name', 'transcript_biotype'])

Below is how annot results look like:

Chromosome/scaffold nameTranscription start site (TSS)    Strand  Gene name   Transcript type
MT    577 1   MT-TF   Mt_tRNA

MT    648 1   MT-RNR1 Mt_rRNA

MT    1602    1   MT-TV   Mt_tRNA  

MT    1671    1   MT-RNR2 Mt_rRNA

MT    3230    1   MT-TL1  Mt_tRNA

...   ... ... ... ... ...

chr1  228416627   -1  TRIM17  protein_coding

chr1  228416652   -1  TRIM17  protein_coding

...   ... ... ... ... ...