I am looking to download promoters bed file from both UCSC and CAGE. I am trying to look into papers to see whether people usually go 1000 bases upstream or 2000 bases upstream. I keep on finding that usually people look 500-2000 bases upstream. But I am looking for an article where people may have used 'x' bases upstream. I haven't been able to find one. Can someone point me in the direction of some of those articles? Also, when looking upstream from CAGE peaks for promoters should I just get the midpoint of those peaks and then go 1000/2000 bases upstream or should I do it from the end like we do in UCSC. What would be a better idea here? I am sorry if this is a basic question. I am CS major and all of this is still very new to me.
The truth is that there is no definition of how far you should be looking. Usually, it is a convention, demonstrated by how "round" that number is, 500 vs 1500 etc.
Design your pipeline so that the distance is tunable, read a few papers and evaluate what you observe. Then rerun your analysis in different ways. Try to make sense of the tradeoffs you see, sensitivity vs specificity etc.
Thank you @Istvan for your answer.