I am searching for a gene in the rock hyrax genome (proCap1
) using the mouse ortholog, which is located in mm10
at chr17:71344493-71475343.
Using the UCSC mysql database, I don't get the same coordinates in proCap1, depending on whether I search in the "chain" or the "net" alignment.
Once I am connected to the database,
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A mm10
I do these queries:
# Chain
SELECT tName,tStart,tEnd,qStrand,qName,qStart,qEnd,id,score FROM chainProCap1
WHERE (tName="chr17"
AND (71344493 <= tStart AND tStart <= 71475343
OR 71344493 <= tEnd AND tEnd <= 71475343))
ORDER BY tStart;
# Net
SELECT level, tName,tStart,tEnd,strand,qName,qStart,qEnd,score,chainId FROM netProCap1
WHERE (tName="chr17" AND type="top"
AND (71344493 <= tStart AND tStart <= 71475343
OR 71344493 <= tEnd AND tEnd <= 71475343))
ORDER BY tStart;
And their results:
Chain:
+-------+----------+----------+---------+-----------------+--------+--------+---------+-------+
| tName | tStart | tEnd | qStrand | qName | qStart | qEnd | id | score |
+-------+----------+----------+---------+-----------------+--------+--------+---------+-------+
| chr17 | 71344307 | 71345537 | - | scaffold_115863 | 4129 | 5390 | 98261 | 33044 |
| chr17 | 71348765 | 71349076 | - | scaffold_45687 | 7675 | 7995 | 1069519 | 10518 |
| chr17 | 71349501 | 71350225 | - | scaffold_286389 | 301 | 912 | 600619 | 14552 |
| chr17 | 71353320 | 71353647 | - | scaffold_45687 | 13998 | 14354 | 450458 | 16409 |
| chr17 | 71356976 | 71366664 | - | scaffold_58060 | 6662 | 18451 | 59516 | 43785 |
| chr17 | 71369677 | 71369991 | + | scaffold_136193 | 5810 | 6136 | 677010 | 13810 |
| chr17 | 71370047 | 71381518 | - | scaffold_39409 | 653 | 16004 | 26685 | 77165 |
| chr17 | 71386776 | 71400977 | - | scaffold_6004 | 3376 | 31456 | 59454 | 43813 |
| chr17 | 71403036 | 71403303 | - | scaffold_6004 | 36494 | 36767 | 851074 | 12305 |
| chr17 | 71411610 | 71412070 | - | scaffold_6004 | 45733 | 46196 | 410158 | 17057 |
| chr17 | 71415580 | 71415876 | + | scaffold_173989 | 37 | 338 | 795373 | 12765 |
| chr17 | 71421335 | 71421684 | - | scaffold_92096 | 962 | 1365 | 860242 | 12230 |
| chr17 | 71426009 | 71429765 | - | scaffold_6004 | 65154 | 68927 | 32900 | 65989 |
| chr17 | 71431173 | 71441015 | - | scaffold_87525 | 315 | 2878 | 68830 | 39933 |
| chr17 | 71443948 | 71444160 | - | scaffold_87525 | 5699 | 5910 | 1381300 | 7952 |
| chr17 | 71447491 | 71448878 | + | scaffold_47467 | 712 | 1665 | 147828 | 27038 |
| chr17 | 71455537 | 71465879 | - | scaffold_23639 | 5711 | 12225 | 68263 | 40118 |
| chr17 | 71458177 | 71459898 | + | scaffold_1036 | 107842 | 113323 | 25364 | 80102 |
+-------+----------+----------+---------+-----------------+--------+--------+---------+-------+
18 rows in set (1.09 sec)
Net:
+-------+-------+----------+----------+--------+-----------------+--------+--------+-------+---------+
| level | tName | tStart | tEnd | strand | qName | qStart | qEnd | score | chainId |
+-------+-------+----------+----------+--------+-----------------+--------+--------+-------+---------+
| 1 | chr17 | 71344307 | 71345537 | - | scaffold_115863 | 270 | 1531 | 33044 | 98261 |
| 1 | chr17 | 71348765 | 71349076 | - | scaffold_45687 | 10669 | 10989 | 10518 | 1069519 |
| 1 | chr17 | 71349501 | 71350225 | - | scaffold_286389 | 137 | 748 | 14552 | 600619 |
| 1 | chr17 | 71353320 | 71353647 | - | scaffold_45687 | 4310 | 4666 | 16409 | 450458 |
| 1 | chr17 | 71356976 | 71366664 | - | scaffold_58060 | 159 | 11948 | 43785 | 59516 |
| 1 | chr17 | 71369677 | 71369991 | + | scaffold_136193 | 5810 | 6136 | 13810 | 677010 |
| 1 | chr17 | 71370047 | 71381518 | - | scaffold_39409 | 2627 | 17978 | 77165 | 26685 |
| 1 | chr17 | 71386776 | 71400977 | - | scaffold_6004 | 37912 | 65992 | 43813 | 59454 |
| 1 | chr17 | 71403036 | 71403303 | - | scaffold_6004 | 32601 | 32874 | 12305 | 851074 |
| 1 | chr17 | 71411610 | 71412070 | - | scaffold_6004 | 23172 | 23635 | 17057 | 410158 |
| 1 | chr17 | 71415580 | 71415876 | + | scaffold_173989 | 37 | 338 | 12765 | 795373 |
| 1 | chr17 | 71421335 | 71421684 | - | scaffold_92096 | 6512 | 6915 | 12230 | 860242 |
| 1 | chr17 | 71426009 | 71429765 | - | scaffold_6004 | 441 | 4214 | 65989 | 32900 |
| 1 | chr17 | 71431173 | 71441015 | - | scaffold_87525 | 6996 | 9559 | 39933 | 68830 |
| 1 | chr17 | 71443948 | 71444160 | - | scaffold_87525 | 3964 | 4175 | 7952 | 1381300 |
| 1 | chr17 | 71447491 | 71448878 | + | scaffold_47467 | 712 | 1665 | 27038 | 147828 |
| 1 | chr17 | 71455537 | 71456659 | - | scaffold_23639 | 19155 | 20551 | 23992 | 68263 |
| 1 | chr17 | 71458177 | 71459898 | + | scaffold_1036 | 107842 | 113323 | 80102 | 25364 |
| 1 | chr17 | 71459898 | 71465879 | - | scaffold_23639 | 14037 | 18794 | 16060 | 68263 |
+-------+-------+----------+----------+--------+-----------------+--------+--------+-------+---------+
19 rows in set (0.34 sec)
As you can see, the blocks in Mm10 genome are the exact same (tName, tStart, tEnd), the length and the score in proCap1 are the same, so it is probably the same region aligned, but why are qStart
and qEnd
different? Which one should I choose to extract the sequences from the hyrax genome in .2bit
format?
As a possibly useful information, I did make the equivalent query using proCap1
as the reference:
use proCap1;
# Net
SELECT level, tName,tStart,tEnd,strand,qName,qStart,qEnd,score,chainId FROM netMm10
WHERE (qName="chr17" AND type="top" AND
(71344493 <= qStart AND qStart <= 71475343
OR 71344493 <= qEnd AND qEnd <= 71475343))
ORDER BY qStart;
# -> 16 rows
# Chain
SELECT tName,tStart,tEnd,qStrand,qName,qStart,qEnd,score,Id FROM chainMm10
WHERE (qName="chr17"
AND (71344493 <= qStart AND qStart <= 71475343
OR 71344493 <= qEnd AND qEnd <= 71475343))
ORDER BY qStart;
# -> 4248 rows
the query on the net alignment gives (almost) the same output (16 rows), however there are 4248 rows when querying the chain alignment: many hyrax sequences mapping the same mouse region... But I still don't see why the query coordinates are different when using Mm10 as a reference.
PS: if this can help, here is the aligned region in the genome browser.