As part of a large-scale genomic integration project, I ended up with chromosome positions mapped to human genome patches like:
HG104_HG975_PATCH
HG1257_PATCH
HG19_PATCH
HSCHR10_1_CTG2
HSCHR17_1
I need to map these patch IDs back to chromosome numbers to unify them with rest of the database. While the last two cases indicates chromosome numbers as 10 and 17 respectively, am looking for a strategy to map the first three.
Full list of patch IDs that we have is uploaded here.
Note that Ensembl version 75 is based on the GRCh37.14 assembly. Version 76, expected later this month, will be based on GRCh38.
mysql -h ensembldb.ensembl.org -u anonymous
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 659408
Server version: 5.1.72-log MySQL Community Server (GPL)
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use homo_sapiens_core_75_37;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT DISTINCT s1.name AS patch, s2.name AS chromosome, a.exc_seq_region_start AS start, a.exc_seq_region_end AS end FROM assembly_exception AS a, seq_region AS s1, seq_region AS s2 WHERE a.seq_region_id = s1.seq_region_id AND a.exc_seq_region_id = s2.seq_region_id AND a.exc_type IN ('PATCH_FIX', 'PATCH_NOVEL');
Thank you Bert!
some of the portions in orientation column denoted as 'b', can you please tell why it so?, what it is meant by?