I am working on a R script that retrieves promoter regions for genes. My question isn't about the specific programming, but in a general biology sense.
Consider the gene given to me below by TxDb
99889 chr3 [ 84299986, 85691440] - | 99889
This gene on chr3 (mm9) is on the negative strand. When I am looking for the promoter region (let's say 1000bp) am I retrieving the sequence start - 2000bp
or end + 2000bp
?
From what I understand, the gene is transcribed from the "end -> start" since its on the negative strand so my intuition says that the promoter region should be end + 2000bp
Also, why would the start < end
if it's on the negative strand. Going from 5' to 3' the position should be decreasing. I guess this is more of a question of how bioinformatics industry views this.