shpak.max ,
First, bravo for specifying both a reference build (hg19) and a standardized gene identifier system (HUGO gene names).
As GenoMax and Maximilian Haeussler have pointed out in their excellent answers you can select a “canonical” or “best” transcript (such as using the RefSeq Select tracks). This is a solid, practical answer and remains state of the art in many areas. However, ultimately, there is more than one phenotypically relevant transcript per gene for a LOT of genes. One good resource for this is MANE plus Clinical.
I wanted to make the same recommendation, but also wanted to give you an example of a case in which getting the right answer actually depends on correct transcript isoform selection.
Example: A missed diagnosis of Kleefstra Syndrome in a 2 year old girl.
Kleefstra syndrome is a neurodevelopmental disorder characterized by intellectual disability, hypotonia, distinctive facial features, and often congenital heart defects. It usually results from haploinsufficiency EHMT1 a gene that mediates chromatin modification and transcriptional regulation. However, loss of function mutations in KMT2C can also lead to a Kleefstra‐like phenotype ...
Now, consider this article. In the report, the analysis of the microarray data uses the basic gene annotation of KMT2C.
Deletions in the proximal exons of KMT2C are frequently seen in healthy individuals and are generally considered benign. However, this patient had a deletion in a more distal region that specifically affects an isoform of KMT2C made in the brain and thats required for normal neural development. But, because this analysis didn't distinguish between the various isoforms of KMT2C, the deletion was overlooked and not flagged as pathogenic. As a result the little girl was not diagnosed with Kleefstra-like syndrome :-(
This same article provides two additional examples.
Pull normal and brain-specific transcript isoforms (and save the little girl)!!
The first command below extracts the basic gene-level coordinates from the Gencode basic annotation first, then the transcript-level coordinates for a specific isoform from the full annotation file. (here we use ENST00000496432.1 as an illustrative brain-expressed transcript; these are found by consulting biomedical literature).
Download the annotation files for GRCh37 (hg19)
curl -O https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/GRCh37_mapping/gencode.v47lift37.basic.annotation.gtf.gz
curl -O https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/GRCh37_mapping/gencode.v47lift37.annotation.gtf.gz
First extract gene-level coordinates for KMT2C from the basic annotation:
echo "Gene-level coordinates for KMT2C (basic annotation):"
zcat gencode.v47lift37.basic.annotation.gtf.gz | \
awk '$3=="gene" && /gene_name "KMT2C"/ { print $1, $4, $5, "KMT2C" }'
Second extract transcript-level coordinates for a brain-expressed isoform.
echo "Transcript-level coordinates for brain-expressed KMT2C isoform (ENST00000496432.1):"
zcat gencode.v47lift37.annotation.gtf.gz | \
awk '$3=="transcript" && /gene_name "KMT2C"/ && /ENST00000496432.1/ { print $1, $4, $5, "ENST00000496432.1" }'
Now, you can search your variants against the coordinates of both isoforms, and see if anything is awry in either. In practice, folks use many heuristics and databases to do this quickly.
Thank you for the reference, I'll look into it.
Incidentally, among other issues, UCSC Table browser feature seems to be buggy - for about 1/5 of my genes, it returns the same value for the start/stop position (usually duplicating the cds stop position), while for the others the start-stop more or less matches what one would get by using their web browser.