Mouse promoter regions
1
0
Entering edit mode
3.9 years ago
el24 ▴ 40

Hi all,

I have a quick question, but I'm new to this so I'm not sure how to solve it. I need to find the promoter regions for mouse data, but I can't find a reliable file to download yet. It would be great if someone can help me with this, please. This is a website I found, but I am not sure what to download. What I want is a promoter file like this:

chr   start    end   gene    strand
.
.

Thanks!

promoter mouse chr • 2.5k views
ADD COMMENT
4
Entering edit mode
3.9 years ago

Here's an R solution.

library("AnnotationHub")
library("ensembldb")

release <- 101
anno <- query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release))[[1]]

proms <- promoters(genes(anno), upstream=1000, downstream=100)
proms <- trim(proms)[, "gene_id"]

> proms
GRanges object with 56305 ranges and 1 metadata column:
                     seqnames            ranges strand |            gene_id
                        <Rle>         <IRanges>  <Rle> |        <character>
  ENSMUSG00000102693        1   3072253-3073352      + | ENSMUSG00000102693
  ENSMUSG00000064842        1   3101016-3102115      + | ENSMUSG00000064842
  ENSMUSG00000051951        1   3671399-3672498      - | ENSMUSG00000051951
  ENSMUSG00000102851        1   3251757-3252856      + | ENSMUSG00000102851
  ENSMUSG00000103377        1   3368450-3369549      - | ENSMUSG00000103377
                 ...      ...               ...    ... .                ...
  ENSMUSG00000095366        Y 90755368-90756467      - | ENSMUSG00000095366
  ENSMUSG00000095134        Y 90752057-90753156      + | ENSMUSG00000095134
  ENSMUSG00000096768        Y 90783738-90784837      + | ENSMUSG00000096768
  ENSMUSG00000099871        Y 90836413-90837512      + | ENSMUSG00000099871
  ENSMUSG00000096850        Y 90839078-90840177      - | ENSMUSG00000096850
  -------
  seqinfo: 118 sequences from GRCm38 genome

If you want to export it as a bed file you can do rtracklayer::export(proms, "mouse_promoters.bed", "bed").

ADD COMMENT
0
Entering edit mode

Thank you very much, this is very helpful! Quick question, when I try release <- 101, I get the following error:

  Error in .Hub_get1(x[i], force = force, verbose = verbose) : no records found for the given index

Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?

Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.

Thanks!

ADD REPLY
1
Entering edit mode

Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?

It might be safer to check what the query is returning if you run into that problem, since it might be matching more than one release. Run query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release)) and see exactly how many hits are being returned. If it's more than one use the ID of the correct hit (Usually looks like AH83247) to pull it directly.

ah <- AnnotationHub()
anno <- ah[["AH83247"]]

Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.

The promoter regions are being defined relative to the TSS of the gene (or transcript). Those parameters mean the range 1000 bases upstream of the TSS to 100 bases downstream of the TSS. That's a fairly conservative range for mice. Your final range can depend somewhat on what you are actually looking for in those regions, but you could expand it to something like 2500 bases upstream to 250 downstream for example.

ADD REPLY
0
Entering edit mode

Thank you for the clear explanation!

Here is what I get when running the code:

release <- 101
query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release))

snapshotDate(): 2019-10-29
AnnotationHub with 0 records
# snapshotDate(): 2019-10-29

Then, when I try:

ah <- AnnotationHub()
anno <- ah[["AH83247"]]

I get this error:

Error: Public

I appreciate it if you can let me know if you know how I can solve this. Thanks!

ADD REPLY
1
Entering edit mode

Try updating the AnnotationHub library and seeing if it helps.

ADD REPLY
0
Entering edit mode

I have tried it, but it didn't help. Thanks for helping, anyway!

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6