Naming Chromosomes: From Ncbi Nc_0000Abc To Ucsc Chrabc
3
here is a "grep" on a NCBI-gene XML file.
$ egrep -A 1 "chromosome [0-9]* refe" gene_result.xml | head -n 20
<Gene-commentary_label>chromosome 20 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
<Gene-commentary_accession>NC_000020</Gene-commentary_accession>
--
<Gene-commentary_label>chromosome 17 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
<Gene-commentary_accession>NC_000017</Gene-commentary_accession>
--
<Gene-commentary_label>chromosome 4 reference GRCh37.p9 Primary Assembly</Gene-commentary_label>
<Gene-commentary_accession>NC_000004</Gene-commentary_accession>
-- (...)
can I assume that "NC_0000ABC" (NCBI) is equivalent to chrABC (UCSC) ?
how are named the chromosomes "chr*_random" at the NCBI ?
is there a resource where I can find a mapping from NC_0000* to chr* ?
Thanks,
Pierre
chromosome
ncbi
• 8.8k views
using this full table from http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13
I think you can get all the corresponding sequences of the primary assembly (which includes unscaffolded contigs)
ucsc's chr1gl000191 random == HSCHR1RANDOM CTG5 ==GL000191.1 == NT_113878.1
# GB Accession: GCA_000001405.1
# RS Accession: GCF_000001405.13
# RefSeq Assembly and GenBank Assemblies Identical: yes
# Name: GRCh37
# GB Release Id: 2468
# RS Release Id: 2758
# Organism name: Homo sapiens
# Taxid: 9606
# Reporting: Top level objects
#
# Ordered by chromosome/plasmid; the chromosomes/plasmids are followed by
# unlocalized scaffolds.
# Unplaced scaffolds are listed at the end.
#
# Object-Name Role Chromosome/Plasmid GenBank-Accn RefSeq-Accn Assembly-Unit
1 chromosome 1 CM000663.1 NC_000001.10 Primary Assembly
2 chromosome 2 CM000664.1 NC_000002.11 Primary Assembly
3 chromosome 3 CM000665.1 NC_000003.11 Primary Assembly
4 chromosome 4 CM000666.1 NC_000004.11 Primary Assembly
5 chromosome 5 CM000667.1 NC_000005.9 Primary Assembly
6 chromosome 6 CM000668.1 NC_000006.11 Primary Assembly
7 chromosome 7 CM000669.1 NC_000007.13 Primary Assembly
8 chromosome 8 CM000670.1 NC_000008.10 Primary Assembly
9 chromosome 9 CM000671.1 NC_000009.11 Primary Assembly
10 chromosome 10 CM000672.1 NC_000010.10 Primary Assembly
11 chromosome 11 CM000673.1 NC_000011.9 Primary Assembly
12 chromosome 12 CM000674.1 NC_000012.11 Primary Assembly
13 chromosome 13 CM000675.1 NC_000013.10 Primary Assembly
14 chromosome 14 CM000676.1 NC_000014.8 Primary Assembly
15 chromosome 15 CM000677.1 NC_000015.9 Primary Assembly
16 chromosome 16 CM000678.1 NC_000016.9 Primary Assembly
17 chromosome 17 CM000679.1 NC_000017.10 Primary Assembly
18 chromosome 18 CM000680.1 NC_000018.9 Primary Assembly
19 chromosome 19 CM000681.1 NC_000019.9 Primary Assembly
20 chromosome 20 CM000682.1 NC_000020.10 Primary Assembly
21 chromosome 21 CM000683.1 NC_000021.8 Primary Assembly
22 chromosome 22 CM000684.1 NC_000022.10 Primary Assembly
X chromosome X CM000685.1 NC_000023.10 Primary Assembly
Y chromosome Y CM000686.1 NC_000024.9 Primary Assembly
HSCHR1_RANDOM_CTG5 unlocalized-scaffold 1 GL000191.1 NT_113878.1 Primary Assembly
HSCHR1_RANDOM_CTG12 unlocalized-scaffold 1 GL000192.1 NT_167207.1 Primary Assembly
HSCHR4_RANDOM_CTG2 unlocalized-scaffold 4 GL000193.1 NT_113885.1 Primary Assembly
HSCHR4_RANDOM_CTG3 unlocalized-scaffold 4 GL000194.1 NT_113888.1 Primary Assembly
HSCHR7_RANDOM_CTG1 unlocalized-scaffold 7 GL000195.1 NT_113901.1 Primary Assembly
HSCHR8_RANDOM_CTG1 unlocalized-scaffold 8 GL000196.1 NT_113909.1 Primary Assembly
HSCHR8_RANDOM_CTG4 unlocalized-scaffold 8 GL000197.1 NT_113907.1 Primary Assembly
HSCHR9_RANDOM_CTG1 unlocalized-scaffold 9 GL000198.1 NT_113914.1 Primary Assembly
HSCHR9_RANDOM_CTG2 unlocalized-scaffold 9 GL000199.1 NT_113916.2 Primary Assembly
HSCHR9_RANDOM_CTG4 unlocalized-scaffold 9 GL000200.1 NT_113915.1 Primary Assembly
HSCHR9_RANDOM_CTG5 unlocalized-scaffold 9 GL000201.1 NT_113911.1 Primary Assembly
HSCHR11_RANDOM_CTG2 unlocalized-scaffold 11 GL000202.1 NT_113921.2 Primary Assembly
HSCHR17_RANDOM_CTG1 unlocalized-scaffold 17 GL000203.1 NT_113941.1 Primary Assembly
HSCHR17_RANDOM_CTG2 unlocalized-scaffold 17 GL000204.1 NT_113943.1 Primary Assembly
HSCHR17_RANDOM_CTG3 unlocalized-scaffold 17 GL000205.1 NT_113930.1 Primary Assembly
HSCHR17_RANDOM_CTG4 unlocalized-scaffold 17 GL000206.1 NT_113945.1 Primary Assembly
HSCHR18_RANDOM_CTG1 unlocalized-scaffold 18 GL000207.1 NT_113947.1 Primary Assembly
HSCHR19_RANDOM_CTG1 unlocalized-scaffold 19 GL000208.1 NT_113948.1 Primary Assembly
HSCHR19_RANDOM_CTG2 unlocalized-scaffold 19 GL000209.1 NT_113949.1 Primary Assembly
HSCHR21_RANDOM_CTG9 unlocalized-scaffold 21 GL000210.1 NT_113950.2 Primary Assembly
HSCHRUN_RANDOM_CTG1 unplaced-scaffold Chromosome GL000211.1 NT_113961.1 Primary Assembly
HSCHRUN_RANDOM_CTG2 unplaced-scaffold Chromosome GL000212.1 NT_113923.1 Primary Assembly
HSCHRUN_RANDOM_CTG3 unplaced-scaffold Chromosome GL000213.1 NT_167208.1 Primary Assembly
HSCHRUN_RANDOM_CTG4 unplaced-scaffold Chromosome GL000214.1 NT_167209.1 Primary Assembly
HSCHRUN_RANDOM_CTG5 unplaced-scaffold Chromosome GL000215.1 NT_167210.1 Primary Assembly
HSCHRUN_RANDOM_CTG6 unplaced-scaffold Chromosome GL000216.1 NT_167211.1 Primary Assembly
HSCHRUN_RANDOM_CTG7 unplaced-scaffold Chromosome GL000217.1 NT_167212.1 Primary Assembly
HSCHRUN_RANDOM_CTG9 unplaced-scaffold Chromosome GL000218.1 NT_113889.1 Primary Assembly
HSCHRUN_RANDOM_CTG10 unplaced-scaffold Chromosome GL000219.1 NT_167213.1 Primary Assembly
HSCHRUN_RANDOM_CTG11 unplaced-scaffold Chromosome GL000220.1 NT_167214.1 Primary Assembly
HSCHRUN_RANDOM_CTG13 unplaced-scaffold Chromosome GL000221.1 NT_167215.1 Primary Assembly
HSCHRUN_RANDOM_CTG14 unplaced-scaffold Chromosome GL000222.1 NT_167216.1 Primary Assembly
HSCHRUN_RANDOM_CTG15 unplaced-scaffold Chromosome GL000223.1 NT_167217.1 Primary Assembly
HSCHRUN_RANDOM_CTG16 unplaced-scaffold Chromosome GL000224.1 NT_167218.1 Primary Assembly
HSCHRUN_RANDOM_CTG17 unplaced-scaffold Chromosome GL000225.1 NT_167219.1 Primary Assembly
HSCHRUN_RANDOM_CTG19 unplaced-scaffold Chromosome GL000226.1 NT_167220.1 Primary Assembly
HSCHRUN_RANDOM_CTG20 unplaced-scaffold Chromosome GL000227.1 NT_167221.1 Primary Assembly
HSCHRUN_RANDOM_CTG21 unplaced-scaffold Chromosome GL000228.1 NT_167222.1 Primary Assembly
HSCHRUN_RANDOM_CTG22 unplaced-scaffold Chromosome GL000229.1 NT_167223.1 Primary Assembly
HSCHRUN_RANDOM_CTG23 unplaced-scaffold Chromosome GL000230.1 NT_167224.1 Primary Assembly
HSCHRUN_RANDOM_CTG24 unplaced-scaffold Chromosome GL000231.1 NT_167225.1 Primary Assembly
HSCHRUN_RANDOM_CTG25 unplaced-scaffold Chromosome GL000232.1 NT_167226.1 Primary Assembly
HSCHRUN_RANDOM_CTG26 unplaced-scaffold Chromosome GL000233.1 NT_167227.1 Primary Assembly
HSCHRUN_RANDOM_CTG27 unplaced-scaffold Chromosome GL000234.1 NT_167228.1 Primary Assembly
HSCHRUN_RANDOM_CTG28 unplaced-scaffold Chromosome GL000235.1 NT_167229.1 Primary Assembly
HSCHRUN_RANDOM_CTG29 unplaced-scaffold Chromosome GL000236.1 NT_167230.1 Primary Assembly
HSCHRUN_RANDOM_CTG30 unplaced-scaffold Chromosome GL000237.1 NT_167231.1 Primary Assembly
HSCHRUN_RANDOM_CTG31 unplaced-scaffold Chromosome GL000238.1 NT_167232.1 Primary Assembly
HSCHRUN_RANDOM_CTG32 unplaced-scaffold Chromosome GL000239.1 NT_167233.1 Primary Assembly
HSCHRUN_RANDOM_CTG33 unplaced-scaffold Chromosome GL000240.1 NT_167234.1 Primary Assembly
HSCHRUN_RANDOM_CTG34 unplaced-scaffold Chromosome GL000241.1 NT_167235.1 Primary Assembly
HSCHRUN_RANDOM_CTG35 unplaced-scaffold Chromosome GL000242.1 NT_167236.1 Primary Assembly
HSCHRUN_RANDOM_CTG36 unplaced-scaffold Chromosome GL000243.1 NT_167237.1 Primary Assembly
HSCHRUN_RANDOM_CTG37 unplaced-scaffold Chromosome GL000244.1 NT_167238.1 Primary Assembly
HSCHRUN_RANDOM_CTG38 unplaced-scaffold Chromosome GL000245.1 NT_167239.1 Primary Assembly
HSCHRUN_RANDOM_CTG39 unplaced-scaffold Chromosome GL000246.1 NT_167240.1 Primary Assembly
HSCHRUN_RANDOM_CTG40 unplaced-scaffold Chromosome GL000247.1 NT_167241.1 Primary Assembly
HSCHRUN_RANDOM_CTG41 unplaced-scaffold Chromosome GL000248.1 NT_167242.1 Primary Assembly
HSCHRUN_RANDOM_CTG42 unplaced-scaffold Chromosome GL000249.1 NT_167243.1 Primary Assembly
HSCHR6_MHC_APD_CTG1 alt-scaffold 6 GL000250.1 NT_167244.1 ALT_REF_LOCI_1
HSCHR6_MHC_COX_CTG1 alt-scaffold 6 GL000251.1 NT_113891.2 ALT_REF_LOCI_2
HSCHR6_MHC_DBB_CTG1 alt-scaffold 6 GL000252.1 NT_167245.1 ALT_REF_LOCI_3
HSCHR6_MHC_MANN_CTG1 alt-scaffold 6 GL000253.1 NT_167246.1 ALT_REF_LOCI_4
HSCHR6_MHC_MCF_CTG1 alt-scaffold 6 GL000254.1 NT_167247.1 ALT_REF_LOCI_5
HSCHR6_MHC_QBL_CTG1 alt-scaffold 6 GL000255.1 NT_167248.1 ALT_REF_LOCI_6
HSCHR6_MHC_SSTO_CTG1 alt-scaffold 6 GL000256.1 NT_167249.1 ALT_REF_LOCI_7
HSCHR4_1_CTG9 alt-scaffold 4 GL000257.1 NT_167250.1 ALT_REF_LOCI_8
HSCHR17_1_CTG5 alt-scaffold 17 GL000258.1 NT_167251.1 ALT_REF_LOCI_9
cthreepo
(https://github.com/vkkodali/cthreepo ) is a python script that I wrote just for this purpose. It uses the assembly_report.txt
files that you can download from NCBI FTP site for the conversions.
Historically there are small differences in the way that NCBI, EBI and UCSC name the chromosomes. What is "MT" for EBI, is called "chrMT" for NCBI and "chrM" for UCSC. If you used a genome not from UCSC for your analysis, you may have to fix up these small differences. To do this converation on a text file where the first column has the chromosome, e.g. a wig or bed text file, you can use UCSC's little utility chromToUcsc. Download it with "wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc ", make it executable with "chmod a+x chromToUcsc" (it's a python2/3 script) and run it without arguments to get the usage message. Here is an example call: chromToUcsc -g hg19 --get && chromToUcsc -i test.wig -o test.ucsc.wig -a hg19.chromAlias.tsv -g hg19
•
link
4.7 years ago by
max
▴
60
Login before adding your answer.
Traffic: 1693 users visited in the last hour