How to deal with multiple Transcription Start Site in a gene?
1
0
Entering edit mode
9.0 years ago

In order to measure the conservation (Phastcons score) of each gene, I am trying to look at -2000:2000 base pairs below and above from transcription start site (TSS), which will represent my promoter. But the problem is, there are more than one TSS for each gene. What should I do? which one should I choose?

TSS Promoter Phastcons-Score • 3.5k views
ADD COMMENT
1
Entering edit mode
9.0 years ago

If you want to avoid confusion you can simply use the gene coordinates from the Known Gene UCSC track:

> library(Homo.sapiens)
> library(phastCons100way.UCSC.hg19)
> mypromoters = promoters(genes(TxDb.Hsapiens.UCSC.hg19.knownGene), 2000, -2000)
> mypromoters$cons100way = scores(phastCons100way.UCSC.hg19, mypromoters) # takes a while
> mypromoters
GRanges object with 23056 ranges and 2 metadata columns:
        seqnames                 ranges strand   |     gene_id cons100way
           <Rle>              <IRanges>  <Rle>   | <character>  <numeric>
      1    chr19 [ 58874015,  58876214]      -   |           1 0.10872727
     10     chr8 [ 18246755,  18248954]      +   |          10 0.36422727
    100    chr20 [ 43280177,  43282376]      -   |         100 0.04400000
   1000    chr18 [ 25757246,  25759445]      -   |        1000 0.01645455
  10000     chr1 [244006687, 244008886]      -   |       10000 0.03836364
    ...      ...                    ...    ... ...         ...        ...
   9991     chr9 [115095745, 115097944]      -   |        9991 0.25700000
   9992    chr21 [ 35734323,  35736522]      +   |        9992 0.29409091
   9993    chr22 [ 19109768,  19111967]      -   |        9993 0.01236364
   9994     chr6 [ 90537619,  90539818]      +   |        9994 0.07536364
   9997    chr22 [ 50964706,  50966905]      -   |        9997 0.15281818
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
ADD COMMENT
0
Entering edit mode

Hi again The code below is not working for me, mypromoters$cons100way = scores(phastCons100way.UCSC.hg19, mypromoters) what should i do? another question is, instead of gene_id i put tx_name column. but can i convert these UCSC ids to ensmble gene ID. actually i convert it by getBM package of R, but it is not seems good.what do you think?

ADD REPLY

Login before adding your answer.

Traffic: 1677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6