Question

troubles with exon coordinate extraction from ucsc

0

Entering edit mode

6.4 years ago

Eugene A ▴ 190

May be this is some misunderstanding from my side but I failed to figure it out so far.

I'm trying to extract exons coordinates for given set of genes (in bed format) from UCSC, which seems to be a straite forward task.

here the one line example from the bed file (the coordinates are in hg38):

1   156815640   156881850   NTRK1   ENSG00000198400

the problem is - that if I insert the given coordinates (chr1:156,815,640-156,881,850) in the "position" field here (https://genome.ucsc.edu/cgi-bin/hgTables) and then hit "get output" button with "exon + 0" options I'm getting following bed file back:

chr1    156806242   156806573   uc001fqe.3_exon_0_0_chr1_156806243_r    0   -
chr1    156807174   156807345   uc001fqe.3_exon_1_0_chr1_156807175_r    0   -
.
.
. 
chr1    156880448   156880539   uc057mdn.1_exon_1_0_chr1_156880449_f    0   +
chr1    156881456   156881850   uc057mdn.1_exon_2_0_chr1_156881457_f    0   +

As one can notice the first exon starts earlier (156806242) then my initial interval (156815640) so I'm getting several irrelevan exons. Maybe someone can explain me why does it happens and how I can evoid it?

Best, Eugene

sequence genome software error • 1.4k views

ADD COMMENT • link updated 6.4 years ago by Ram 44k • written 6.4 years ago by Eugene A ▴ 190

1

Entering edit mode

Maybe someone can explain me why does it happens

may be the ucsc uses the whole transcript as the initial location

and how I can evoid it?

bedtools intersect

ADD REPLY • link 6.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks! Probably I indeed will work around it. I just naively thought that getting the exon coordinates for set of genes from the ucsc should be one step procedure.

ADD REPLY • link 6.4 years ago by Eugene A ▴ 190

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting