I'm attempting to create an svg image of various transcripts of genes and I'm using data on human genes from the UCSC Genome browser and I'm running into trouble. I have direct MySql access to the database. I have data on exon start and end base pair positions and transcription start and end base pair positions. I'm looking for data either on the base pair position of UTR5' and UTR3' regions of genes or simply the coding region base pair positions which I can then use to exclude that part from the range of the whole gene and leave the remaining area as UTR5' and UTR3'. Any ideas on where in the hg19 database in UCSC Genome Browser I could find this data?
added 'distinct'
Looking through this knownGene table and many times the cdsStart and cdsEnd are the same value. Which doesn't make any sense. How can I trust this data?
so to clarify the cds start and end positions contain no parts of the UTR5' or UTR3' regions correct?
CDS is coding sequence which gets translated to protein. UTR's are untranslated regions. Please have a look at this picture.
So if the data I have says that CDS end comes before transcription is ended then something is wrong correct?