Whole genome coordinates of promoters/gene regulatory elements
2
0
Entering edit mode
8.9 years ago
biocyberman ▴ 870

I am trying to find some database that let me download and compile a table like this:

Organism    Chromosome    Gene    GeneStart    GeneEnd    PromerStart    PromoterEnd

Which database I can start with to download and extract those information? I want to do it for Human, Mouse and Rat with the latest genome version possible. I am comfortable to do any text manipulation provided the information is the for me to download.

Update 1

Purpose of making this table:

I want to make this table to help me choose the upstream regions of genes that may affect gene expression by their methylation status. They can be TSS, polymerase binding site, transcription factor binding site, etc. Anything that involve in regulation of gene expression.

gene rat mouse human genome • 3.5k views
ADD COMMENT
2
Entering edit mode

What is your definition of promoter?

ADD REPLY
0
Entering edit mode

Please see update 1

ADD REPLY
0
Entering edit mode

If you can define your promoter, then you could use the TSS from DBTSS to compile your table.

ADD REPLY
3
Entering edit mode
8.9 years ago

You can access all gene/promoter coordinates from R, using the TxDb objects:

> source("https://bioconductor.org/biocLite.R")
> biocLite("Homo.sapiens")
> library(Homo.sapiens)
> promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene))
GRanges object with 23056 ranges and 1 metadata column:
        seqnames                 ranges strand   |     gene_id
           <Rle>              <IRanges>  <Rle>   | <character>
      1    chr19 [ 58874015,  58876214]      -   |           1
     10     chr8 [ 18246755,  18248954]      +   |          10
    100    chr20 [ 43280177,  43282376]      -   |         100
   1000    chr18 [ 25757246,  25759445]      -   |        1000
  10000     chr1 [244006687, 244008886]      -   |       10000
    ...      ...                    ...    ... ...         ...
   9991     chr9 [115095745, 115097944]      -   |        9991
   9992    chr21 [ 35734323,  35736522]      +   |        9992
   9993    chr22 [ 19109768,  19111967]      -   |        9993
   9994     chr6 [ 90537619,  90539818]      +   |        9994
   9997    chr22 [ 50964706,  50966905]      -   |        9997

You can get a similar info for mouse, rat, just by installing the corresponding packages. Use transcripts() instead of genes() to include multiple transcripts.

Note that with this method the promoter is defined simply as a the region 2200 bp upstream of each gene, without any specific validation of whether this is the promoter of the gene. It should be fine depending on the purpose.

ADD COMMENT
0
Entering edit mode

I would prefer experimentally validated coordinates. But if there is no such thing, I will use this.

ADD REPLY
0
Entering edit mode
8.4 years ago

For other species, as this is the only post considered promoters, a useful tool might be 'bedtools flank'. From bed/gff/gtf you can produce file with intervals flanking start codon. I did this to generate gtf with promoter sequence intervals reaching 500nt downstream and 1000 upstream from a start_codon. This does the job:

bedtools flank -s -i <StartCodonInfile> -g <contigName_contigLength_table> -l 1000 -r 500 > <promotersOutfile>
ADD COMMENT

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6