What is the meaning of the second item in the list? Why do I get two items instead of just one?
there are multiple transcript at this position, associated with different gene names.
Homo sapiens uncharacterized LOC101927278 (LOC101927278), transcript variant 4, long non-coding RNA
Homo sapiens HPS1, biogenesis of lysosomal organelles complex 3 subunit 1 (HPS1), transcript variant 7, mRNA
$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19 -e 'select distinct chrom,txStart,txEnd,name,name2 from refGene where chrom="chr10" and not (txEnd<100206067 or txStart>100206107)'
+-------+-----------+-----------+--------------+--------------+
| chrom | txStart | txEnd | name | name2 |
+-------+-----------+-----------+--------------+--------------+
| chr10 | 100175954 | 100206720 | NM_001322477 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_001322483 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_001322485 | HPS1 |
| chr10 | 100188902 | 100206720 | NM_001322492 | HPS1 |
| chr10 | 100188902 | 100206720 | NM_001322491 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_001322489 | HPS1 |
| chr10 | 100188902 | 100206720 | NM_182639 | HPS1 |
| chr10 | 100206077 | 100213562 | NR_134454 | LOC101927278 |
| chr10 | 100206077 | 100213562 | NR_134453 | LOC101927278 |
| chr10 | 100206077 | 100213562 | NR_134452 | LOC101927278 |
| chr10 | 100206077 | 100213562 | NR_134451 | LOC101927278 |
| chr10 | 100175954 | 100206720 | NM_001322482 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_000195 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_001322487 | HPS1 |
(...)
+-------+-----------+-----------+--------------+--------------+
In the second example, I also get two items:
again, more than one transcript...
~$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19 -e 'select distinct chrom,txStart,txEnd,name,name2 from refGene where chrom="chr10" and not (txEnd<101492076 or txStart>101491968)'
+-------+-----------+-----------+--------------+-------+
| chrom | txStart | txEnd | name | name2 |
+-------+-----------+-----------+--------------+-------+
| chr10 | 101491957 | 101515894 | NM_015960 | CUTC |
| chr10 | 101470624 | 101492423 | NM_001320975 | COX15 |
| chr10 | 101470624 | 101492423 | NM_001320976 | COX15 |
(....)
+-------+-----------+-----------+--------------+-------+
The third and fourth example both give the same result:
$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19 -e 'select distinct chrom,txStart,txEnd,name,name2 from refGene where chrom="chr10" and not (txEnd<100195025 or txStart>100195029)'
+-------+-----------+-----------+--------------+-------+
| chrom | txStart | txEnd | name | name2 |
+-------+-----------+-----------+--------------+-------+
| chr10 | 100175954 | 100206720 | NM_001322477 | HPS1 |
| chr10 | 100175954 | 100206720 | NM_001322483 | HPS1 |
(...)
| chr10 | 100175954 | 100206720 | NM_001322480 | HPS1 |
+-------+-----------+-----------+--------------+-------+
I am not using the information about the strand in my query to refGene. Is this okay?
how can we know ? we don't know what you're trying to do
Thanks, learned several new things from this. I'll add a couple more bits:
cdsStart
==cdsEnd
(these are additional columns available in the refGene table), which by itself implies it's not coding for a proteinstrand
column in refGene), although two transcript models on the same strand can also overlap