I have a list of locations in a bed format
chr2 55159107 55160004
chr3 40280597 40282177
chr4 74682484 74683574
chr4 76795449 76796456
chr6 10250838 10251741
chr6 20435795 20436466
chr6 31169498 31170294
I am interested to identify genes that are 1000bp (1K) upstream and downstream of each of those location from a gtf file, which belongs to a non model plant.
gtf file:
SpoScf_00032 maker exon 12116 12419 . + . gene_id transcript_id "Spo06120";
SpoScf_00032 maker exon 14070 17062 . + . gene_id transcript_id "Spo06120";
SpoScf_00032 maker exon 17626 17899 . + . gene_id transcript_id "Spo06120";
chr2 maker CDS 15262965 15263150 . + 0 gene_id transcript_id "Spo26212";
chr2 maker CDS 15264530 15264667 . + 0 gene_id transcript_id "Spo26212";
chr2 maker CDS 15265433 15265885 . + 0 gene_id transcript_id "Spo26212";
bedtools window, intersect, closest doesn't answer my question because they look for overlaps.
You can create a bed_downstream file with
chr
,start-1000
,start
and an other bed_upstream file withchr
,end
,end+1000
.Then you can run bedtools intersect once on bed_downstream then on bed_upstream.