Is there a list of all promoters' genomic positions with gene names? From the UCSC table browser, I can get the annotation. But instead of gene id, this gives transcript_id. Any suggestion, please?
Is there a list of all promoters' genomic positions with gene names? From the UCSC table browser, I can get the annotation. But instead of gene id, this gives transcript_id. Any suggestion, please?
Here's an R solution to get all transcript promoters with the corresponding gene name. This takes advantage of AnnotationHub
, which is a repository and associated library that allows access to the annotation data for many organisms. I'll be using S. cerevisiae as an example here.
library("AnnotationHub")
ah <- AnnotationHub()
anno <- query(ah, c("cerevisiae", "txdb", "sacCer3"))[[1]]
promoters <- trim(promoters(anno, upstream=500, downstream=500, columns=c("tx_name", "gene_id")))
> head(promoters)
GRanges object with 6 ranges and 2 metadata columns:
seqnames ranges strand | tx_name gene_id
<Rle> <IRanges> <Rle> | <character> <CharacterList>
YAL069W chrI 1-834 + | YAL069W YAL069W
YAL068W-A chrI 38-1037 + | YAL068W-A YAL069W
YAL067W-A chrI 1980-2979 + | YAL067W-A YAL067W-A
YAL066W chrI 9591-10590 + | YAL066W YAL066W
YAL064W-B chrI 11546-12545 + | YAL064W-B YAL064W-B
YAL064W chrI 21066-22065 + | YAL064W YAL064W
-------
seqinfo: 17 sequences (1 circular) from sacCer3 genome
You can export this as a bed file using rtracklayer::export
if you want, or convert it to a data.frame with as.data.frame
and save as a table with write.table
.
Annotation files are usually held as TxDb objects in R.
Any TxDb object will work with this, such as those from AnnotationHub like we used above, or explicit TxDb libraries like this example from yeast, or from a GTF fle using GenomicFeatures::makeTxDbFromGFF
.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.