I have a quick question, but I'm new to this so I'm not sure how to solve it. I need to find the promoter regions for mouse data, but I can't find a reliable file to download yet. It would be great if someone can help me with this, please. This is a website I found, but I am not sure what to download. What I want is a promoter file like this:
Thank you very much, this is very helpful!
Quick question, when I try release <- 101, I get the following error:
Error in .Hub_get1(x[i], force = force, verbose = verbose) : no records found for the given index
Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?
Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.
Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?
It might be safer to check what the query is returning if you run into that problem, since it might be matching more than one release. Run query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release)) and see exactly how many hits are being returned. If it's more than one use the ID of the correct hit (Usually looks like AH83247) to pull it directly.
ah <- AnnotationHub()
anno <- ah[["AH83247"]]
Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.
The promoter regions are being defined relative to the TSS of the gene (or transcript). Those parameters mean the range 1000 bases upstream of the TSS to 100 bases downstream of the TSS. That's a fairly conservative range for mice. Your final range can depend somewhat on what you are actually looking for in those regions, but you could expand it to something like 2500 bases upstream to 250 downstream for example.
Thank you very much, this is very helpful! Quick question, when I try release <- 101, I get the following error:
Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?
Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.
Thanks!
It might be safer to check what the query is returning if you run into that problem, since it might be matching more than one release. Run
query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release))
and see exactly how many hits are being returned. If it's more than one use the ID of the correct hit (Usually looks like AH83247) to pull it directly.The promoter regions are being defined relative to the TSS of the gene (or transcript). Those parameters mean the range 1000 bases upstream of the TSS to 100 bases downstream of the TSS. That's a fairly conservative range for mice. Your final range can depend somewhat on what you are actually looking for in those regions, but you could expand it to something like 2500 bases upstream to 250 downstream for example.
Thank you for the clear explanation!
Here is what I get when running the code:
Then, when I try:
I get this error:
I appreciate it if you can let me know if you know how I can solve this. Thanks!
Try updating the AnnotationHub library and seeing if it helps.
I have tried it, but it didn't help. Thanks for helping, anyway!