Dear all,
I have the gene annotations in gtf format as shown below:
chr1 stdin exon 247829 252562 . - . gene_id "ENPP1"; transcript_id "ENPP1"; exon_number "1";exon_id "ENPP1.1";
chr1 stdin CDS 247832 252562 . - 0 gene_id "ENPP1"; transcript_id "ENPP1"; exon_number "1"; exon_id "ENPP1.1";
chr1 stdin exon 254466 254628 . - . gene_id "ENPP1"; transcript_id "ENPP1"; exon_number "2"; exon_id "ENPP1.2";
chr1 stdin CDS 254466 254628 . - 1 gene_id "ENPP1"; transcript_id "ENPP1"; exon_number "2"; exon_id "ENPP1.2";
chr1 stdin exon 247829 252562 . - . gene_id "ENPP1_2"; transcript_id "ENPP1_2"; exon_number "1"; exon_id "ENPP1_2.1";
chr1 stdin CDS 247829 252562 . - 1 gene_id "ENPP1_2"; transcript_id "ENPP1_2"; exon_number "1"; exon_id "ENPP1_2.1";
chr1 stdin exon 254466 254628 . - . gene_id "ENPP1_2"; transcript_id "ENPP1_2"; exon_number "2"; exon_id "ENPP1_2.2";
chr1 stdin CDS 254466 254628 . - 2 gene_id "ENPP1_2"; transcript_id "ENPP1_2"; exon_number "2"; exon_id "ENPP1_2.2";
gene_id is duplicated for each transcript with a suffix "_1" , "_2".... Are there any tools to determine whether there is a partial or complete gene overlap with few Kb segments using the gene file in the above format?
Just have to tweak a little for non standard Ensembl. The above code will work for your test.genes. it outputs the bed file as below