Naming Chromosomes: From Ncbi Nc_0000Abc To Ucsc Chrabc
3
5
Entering edit mode
12.3 years ago

here is a "grep" on a NCBI-gene XML file.

$ egrep -A 1 "chromosome [0-9]* refe" gene_result.xml | head -n 20
            <Gene-commentary_label>chromosome 20 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
            <Gene-commentary_accession>NC_000020</Gene-commentary_accession>
--
            <Gene-commentary_label>chromosome 17 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
            <Gene-commentary_accession>NC_000017</Gene-commentary_accession>
--
            <Gene-commentary_label>chromosome 4 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
            <Gene-commentary_accession>NC_000004</Gene-commentary_accession>
-- (...)
  • can I assume that "NC_0000ABC" (NCBI) is equivalent to chrABC (UCSC) ?
  • how are named the chromosomes "chr*_random" at the NCBI ?
  • is there a resource where I can find a mapping from NC_0000* to chr* ?

Thanks,

Pierre

chromosome ncbi • 8.8k views
ADD COMMENT
7
Entering edit mode
12.3 years ago

using this full table from http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13

I think you can get all the corresponding sequences of the primary assembly (which includes unscaffolded contigs)

ucsc's chr1gl000191random == HSCHR1RANDOMCTG5 ==GL000191.1 == NT_113878.1

# GB Accession:   GCA_000001405.1
# RS Accession:   GCF_000001405.13
# RefSeq Assembly and GenBank Assemblies Identical: yes
# Name:     GRCh37
# GB Release Id:  2468
# RS Release Id:  2758
# Organism name: Homo sapiens
# Taxid:         9606
# Reporting:     Top level objects
#
# Ordered by chromosome/plasmid; the chromosomes/plasmids are followed by
# unlocalized scaffolds.
# Unplaced scaffolds are listed at the end.
#
# Object-Name   Role    Chromosome/Plasmid  GenBank-Accn    RefSeq-Accn Assembly-Unit
1   chromosome  1   CM000663.1  NC_000001.10    Primary Assembly
2   chromosome  2   CM000664.1  NC_000002.11    Primary Assembly
3   chromosome  3   CM000665.1  NC_000003.11    Primary Assembly
4   chromosome  4   CM000666.1  NC_000004.11    Primary Assembly
5   chromosome  5   CM000667.1  NC_000005.9 Primary Assembly
6   chromosome  6   CM000668.1  NC_000006.11    Primary Assembly
7   chromosome  7   CM000669.1  NC_000007.13    Primary Assembly
8   chromosome  8   CM000670.1  NC_000008.10    Primary Assembly
9   chromosome  9   CM000671.1  NC_000009.11    Primary Assembly
10  chromosome  10  CM000672.1  NC_000010.10    Primary Assembly
11  chromosome  11  CM000673.1  NC_000011.9 Primary Assembly
12  chromosome  12  CM000674.1  NC_000012.11    Primary Assembly
13  chromosome  13  CM000675.1  NC_000013.10    Primary Assembly
14  chromosome  14  CM000676.1  NC_000014.8 Primary Assembly
15  chromosome  15  CM000677.1  NC_000015.9 Primary Assembly
16  chromosome  16  CM000678.1  NC_000016.9 Primary Assembly
17  chromosome  17  CM000679.1  NC_000017.10    Primary Assembly
18  chromosome  18  CM000680.1  NC_000018.9 Primary Assembly
19  chromosome  19  CM000681.1  NC_000019.9 Primary Assembly
20  chromosome  20  CM000682.1  NC_000020.10    Primary Assembly
21  chromosome  21  CM000683.1  NC_000021.8 Primary Assembly
22  chromosome  22  CM000684.1  NC_000022.10    Primary Assembly
X   chromosome  X   CM000685.1  NC_000023.10    Primary Assembly
Y   chromosome  Y   CM000686.1  NC_000024.9 Primary Assembly
HSCHR1_RANDOM_CTG5  unlocalized-scaffold    1   GL000191.1  NT_113878.1 Primary Assembly
HSCHR1_RANDOM_CTG12 unlocalized-scaffold    1   GL000192.1  NT_167207.1 Primary Assembly
HSCHR4_RANDOM_CTG2  unlocalized-scaffold    4   GL000193.1  NT_113885.1 Primary Assembly
HSCHR4_RANDOM_CTG3  unlocalized-scaffold    4   GL000194.1  NT_113888.1 Primary Assembly
HSCHR7_RANDOM_CTG1  unlocalized-scaffold    7   GL000195.1  NT_113901.1 Primary Assembly
HSCHR8_RANDOM_CTG1  unlocalized-scaffold    8   GL000196.1  NT_113909.1 Primary Assembly
HSCHR8_RANDOM_CTG4  unlocalized-scaffold    8   GL000197.1  NT_113907.1 Primary Assembly
HSCHR9_RANDOM_CTG1  unlocalized-scaffold    9   GL000198.1  NT_113914.1 Primary Assembly
HSCHR9_RANDOM_CTG2  unlocalized-scaffold    9   GL000199.1  NT_113916.2 Primary Assembly
HSCHR9_RANDOM_CTG4  unlocalized-scaffold    9   GL000200.1  NT_113915.1 Primary Assembly
HSCHR9_RANDOM_CTG5  unlocalized-scaffold    9   GL000201.1  NT_113911.1 Primary Assembly
HSCHR11_RANDOM_CTG2 unlocalized-scaffold    11  GL000202.1  NT_113921.2 Primary Assembly
HSCHR17_RANDOM_CTG1 unlocalized-scaffold    17  GL000203.1  NT_113941.1 Primary Assembly
HSCHR17_RANDOM_CTG2 unlocalized-scaffold    17  GL000204.1  NT_113943.1 Primary Assembly
HSCHR17_RANDOM_CTG3 unlocalized-scaffold    17  GL000205.1  NT_113930.1 Primary Assembly
HSCHR17_RANDOM_CTG4 unlocalized-scaffold    17  GL000206.1  NT_113945.1 Primary Assembly
HSCHR18_RANDOM_CTG1 unlocalized-scaffold    18  GL000207.1  NT_113947.1 Primary Assembly
HSCHR19_RANDOM_CTG1 unlocalized-scaffold    19  GL000208.1  NT_113948.1 Primary Assembly
HSCHR19_RANDOM_CTG2 unlocalized-scaffold    19  GL000209.1  NT_113949.1 Primary Assembly
HSCHR21_RANDOM_CTG9 unlocalized-scaffold    21  GL000210.1  NT_113950.2 Primary Assembly
HSCHRUN_RANDOM_CTG1 unplaced-scaffold   Chromosome  GL000211.1  NT_113961.1 Primary Assembly
HSCHRUN_RANDOM_CTG2 unplaced-scaffold   Chromosome  GL000212.1  NT_113923.1 Primary Assembly
HSCHRUN_RANDOM_CTG3 unplaced-scaffold   Chromosome  GL000213.1  NT_167208.1 Primary Assembly
HSCHRUN_RANDOM_CTG4 unplaced-scaffold   Chromosome  GL000214.1  NT_167209.1 Primary Assembly
HSCHRUN_RANDOM_CTG5 unplaced-scaffold   Chromosome  GL000215.1  NT_167210.1 Primary Assembly
HSCHRUN_RANDOM_CTG6 unplaced-scaffold   Chromosome  GL000216.1  NT_167211.1 Primary Assembly
HSCHRUN_RANDOM_CTG7 unplaced-scaffold   Chromosome  GL000217.1  NT_167212.1 Primary Assembly
HSCHRUN_RANDOM_CTG9 unplaced-scaffold   Chromosome  GL000218.1  NT_113889.1 Primary Assembly
HSCHRUN_RANDOM_CTG10    unplaced-scaffold   Chromosome  GL000219.1  NT_167213.1 Primary Assembly
HSCHRUN_RANDOM_CTG11    unplaced-scaffold   Chromosome  GL000220.1  NT_167214.1 Primary Assembly
HSCHRUN_RANDOM_CTG13    unplaced-scaffold   Chromosome  GL000221.1  NT_167215.1 Primary Assembly
HSCHRUN_RANDOM_CTG14    unplaced-scaffold   Chromosome  GL000222.1  NT_167216.1 Primary Assembly
HSCHRUN_RANDOM_CTG15    unplaced-scaffold   Chromosome  GL000223.1  NT_167217.1 Primary Assembly
HSCHRUN_RANDOM_CTG16    unplaced-scaffold   Chromosome  GL000224.1  NT_167218.1 Primary Assembly
HSCHRUN_RANDOM_CTG17    unplaced-scaffold   Chromosome  GL000225.1  NT_167219.1 Primary Assembly
HSCHRUN_RANDOM_CTG19    unplaced-scaffold   Chromosome  GL000226.1  NT_167220.1 Primary Assembly
HSCHRUN_RANDOM_CTG20    unplaced-scaffold   Chromosome  GL000227.1  NT_167221.1 Primary Assembly
HSCHRUN_RANDOM_CTG21    unplaced-scaffold   Chromosome  GL000228.1  NT_167222.1 Primary Assembly
HSCHRUN_RANDOM_CTG22    unplaced-scaffold   Chromosome  GL000229.1  NT_167223.1 Primary Assembly
HSCHRUN_RANDOM_CTG23    unplaced-scaffold   Chromosome  GL000230.1  NT_167224.1 Primary Assembly
HSCHRUN_RANDOM_CTG24    unplaced-scaffold   Chromosome  GL000231.1  NT_167225.1 Primary Assembly
HSCHRUN_RANDOM_CTG25    unplaced-scaffold   Chromosome  GL000232.1  NT_167226.1 Primary Assembly
HSCHRUN_RANDOM_CTG26    unplaced-scaffold   Chromosome  GL000233.1  NT_167227.1 Primary Assembly
HSCHRUN_RANDOM_CTG27    unplaced-scaffold   Chromosome  GL000234.1  NT_167228.1 Primary Assembly
HSCHRUN_RANDOM_CTG28    unplaced-scaffold   Chromosome  GL000235.1  NT_167229.1 Primary Assembly
HSCHRUN_RANDOM_CTG29    unplaced-scaffold   Chromosome  GL000236.1  NT_167230.1 Primary Assembly
HSCHRUN_RANDOM_CTG30    unplaced-scaffold   Chromosome  GL000237.1  NT_167231.1 Primary Assembly
HSCHRUN_RANDOM_CTG31    unplaced-scaffold   Chromosome  GL000238.1  NT_167232.1 Primary Assembly
HSCHRUN_RANDOM_CTG32    unplaced-scaffold   Chromosome  GL000239.1  NT_167233.1 Primary Assembly
HSCHRUN_RANDOM_CTG33    unplaced-scaffold   Chromosome  GL000240.1  NT_167234.1 Primary Assembly
HSCHRUN_RANDOM_CTG34    unplaced-scaffold   Chromosome  GL000241.1  NT_167235.1 Primary Assembly
HSCHRUN_RANDOM_CTG35    unplaced-scaffold   Chromosome  GL000242.1  NT_167236.1 Primary Assembly
HSCHRUN_RANDOM_CTG36    unplaced-scaffold   Chromosome  GL000243.1  NT_167237.1 Primary Assembly
HSCHRUN_RANDOM_CTG37    unplaced-scaffold   Chromosome  GL000244.1  NT_167238.1 Primary Assembly
HSCHRUN_RANDOM_CTG38    unplaced-scaffold   Chromosome  GL000245.1  NT_167239.1 Primary Assembly
HSCHRUN_RANDOM_CTG39    unplaced-scaffold   Chromosome  GL000246.1  NT_167240.1 Primary Assembly
HSCHRUN_RANDOM_CTG40    unplaced-scaffold   Chromosome  GL000247.1  NT_167241.1 Primary Assembly
HSCHRUN_RANDOM_CTG41    unplaced-scaffold   Chromosome  GL000248.1  NT_167242.1 Primary Assembly
HSCHRUN_RANDOM_CTG42    unplaced-scaffold   Chromosome  GL000249.1  NT_167243.1 Primary Assembly
HSCHR6_MHC_APD_CTG1 alt-scaffold    6   GL000250.1  NT_167244.1 ALT_REF_LOCI_1
HSCHR6_MHC_COX_CTG1 alt-scaffold    6   GL000251.1  NT_113891.2 ALT_REF_LOCI_2
HSCHR6_MHC_DBB_CTG1 alt-scaffold    6   GL000252.1  NT_167245.1 ALT_REF_LOCI_3
HSCHR6_MHC_MANN_CTG1    alt-scaffold    6   GL000253.1  NT_167246.1 ALT_REF_LOCI_4
HSCHR6_MHC_MCF_CTG1 alt-scaffold    6   GL000254.1  NT_167247.1 ALT_REF_LOCI_5
HSCHR6_MHC_QBL_CTG1 alt-scaffold    6   GL000255.1  NT_167248.1 ALT_REF_LOCI_6
HSCHR6_MHC_SSTO_CTG1    alt-scaffold    6   GL000256.1  NT_167249.1 ALT_REF_LOCI_7
HSCHR4_1_CTG9   alt-scaffold    4   GL000257.1  NT_167250.1 ALT_REF_LOCI_8
HSCHR17_1_CTG5  alt-scaffold    17  GL000258.1  NT_167251.1 ALT_REF_LOCI_9
ADD COMMENT
1
Entering edit mode
4.7 years ago
vkkodali_ncbi ★ 3.8k

cthreepo (https://github.com/vkkodali/cthreepo) is a python script that I wrote just for this purpose. It uses the assembly_report.txt files that you can download from NCBI FTP site for the conversions.

ADD COMMENT
0
Entering edit mode
4.7 years ago
max ▴ 60

Historically there are small differences in the way that NCBI, EBI and UCSC name the chromosomes. What is "MT" for EBI, is called "chrMT" for NCBI and "chrM" for UCSC. If you used a genome not from UCSC for your analysis, you may have to fix up these small differences. To do this converation on a text file where the first column has the chromosome, e.g. a wig or bed text file, you can use UCSC's little utility chromToUcsc. Download it with "wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc", make it executable with "chmod a+x chromToUcsc" (it's a python2/3 script) and run it without arguments to get the usage message. Here is an example call: chromToUcsc -g hg19 --get && chromToUcsc -i test.wig -o test.ucsc.wig -a hg19.chromAlias.tsv -g hg19

ADD COMMENT

Login before adding your answer.

Traffic: 2459 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6