Hello everyone,
I am looking for a way to filter out intra-genic regions (thus exons + introns), to only keep inter-genic regions from a .bed file. My first guess was to use the intersect command from bedtools, by comparing my bed file with the human reference genome
bedtools intersect -a my_file.bed -b hg38_genome.bed -v > out.bed
I extracted the hg38_genome.bed file from UCSC webpage (with default parameters) but I am unsure of its content: does this file give exons associated chromosome locations and coordinates ? Or does it give entire genes (exons + introns) boundaries, which is what I am looking for ?
I am very new to bioinformatics so do not hesitate to suggest a different approach.
Best,
Paul
Hello @kevin, thanks for you reply !
I think that is what I am looking for, but when I check the first few lines of the generated file, I do not understand why I get overlapping regions regarding the coordinates:
Am I missing something ?
Overlapping features are usually different transcripts of the same gene. You can get a visual confirmation of this if you load the downloaded bed file as a custom track in the genome browser. You can also see the transcript id annotated in the fourth column of your screenshot.