May be this is some misunderstanding from my side but I failed to figure it out so far.
I'm trying to extract exons coordinates for given set of genes (in bed format) from UCSC, which seems to be a straite forward task.
here the one line example from the bed file (the coordinates are in hg38):
1 156815640 156881850 NTRK1 ENSG00000198400
the problem is - that if I insert the given coordinates (chr1:156,815,640-156,881,850) in the "position" field here (https://genome.ucsc.edu/cgi-bin/hgTables) and then hit "get output" button with "exon + 0" options I'm getting following bed file back:
chr1 156806242 156806573 uc001fqe.3_exon_0_0_chr1_156806243_r 0 -
chr1 156807174 156807345 uc001fqe.3_exon_1_0_chr1_156807175_r 0 -
.
.
.
chr1 156880448 156880539 uc057mdn.1_exon_1_0_chr1_156880449_f 0 +
chr1 156881456 156881850 uc057mdn.1_exon_2_0_chr1_156881457_f 0 +
As one can notice the first exon starts earlier (156806242) then my initial interval (156815640) so I'm getting several irrelevan exons. Maybe someone can explain me why does it happens and how I can evoid it?
Best, Eugene
may be the ucsc uses the whole transcript as the initial location
Thanks! Probably I indeed will work around it. I just naively thought that getting the exon coordinates for set of genes from the ucsc should be one step procedure.
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thanks! I'm new to biostar so I did not know about that option.