I'm trying to find a program or script that can be used to systematically return a list of genes in the genome that fall within some specified distance of a response element.
I have two files to work with: 1.) A list of predicted response elements in the genome, identified using PoSSuM-search, that includes each elements genomic coordinates 2.) The annotation file from that same genome, that includes genomic coordinates for gene features
Ideally, I want to use the genomic coordinates for response elements in file (1) to pull out any gene present in the annotation file that falls within a pre-specified distance of a response element (e.g 100kb).
Thank you!
To moderators: this same question was crossposted on Researchgate https://www.researchgate.net/post/Recommended_programs_to_systematically_identify_genes_within_some_distance_of_a_response_element_in_the_genome
Have a look at
bedtools slop
to define windows around a set of coordinates (here these response elements) andbedtools intersect
to intersect those with the genes. Can you give an example how the output should look like?I'm figuring it out as I go, but ideally the output would be a .txt, with columns defined as:
1.) Gene (one per line): all genes located within 100kb of a response element identified in the input 2.) Strand 3.) Gene start position 4.) Gene strand 5.) Response element start position 6.) Response element strand