Entering edit mode
6.5 years ago
always_learning
★
1.1k
Hi All,
I want to extract the regions for the few genes in upstream and downstream based on certain intervals for example in the range of 0-50, 50-5000, 5000-50000 basepair. What could be the best way to do that? One way I could think off is to make a bed files for these intervals separately and then search it using tools. Is there any other smarter way to do this?
Thanks
You are on the right track. The general approach is described here: retrieving sequences of a upstream and downstream of a coordinate for hg19
To do something like this in the past, I began by retrieving the TSS locations for all genes from Ensembl Biomart. I then wrote a simple AWK command like the one in the link provided by @genomax to get exactly what I want, using the TSS positions as the starting location.
EDIT: With sachas answer below you now have the 2 main ways to do this. Both begin with getting the coordinates of your gene and then selecting what you want from a reference file via an scripted command (e.g. AWK) or a bedtools function.
Thanks !!
How will I extract something like interval 50-5000 BPs region from a gene?