How to tell a sequence found in NCBI ("Nucleotide") is from Human Reference Genome or not.
0
1
Entering edit mode
4.6 years ago

We are working on a bunch of novel insertions found in our new assembly. One QC step of our analysis is to blast those insertions to both hg38 and NCBI/BLAST nt database. With same BLAST parameters, we got inconsistence number of hits to hg38 and nt database. Basically, nt hits is a superset of hg38 hits. From the subset which only hit nt database not hg38, we found a lot of sequences with accession beginning with AL, AC. By checking details of what those sequences are, it shows that they are sequences from some clones:

For example:

AL669831.13. Human DNA sequence from clone RP11-206L10 on chromosome 1, complete sequence
AL663100.12 Human DNA sequence from clone RP5-1052J3 on chromosome 1, complete sequence
AL607067.17 Human DNA sequence from clone RP11-201K10 on chromosome 1, complete sequence
AL392003.11 Human DNA sequence from clone RP11-331H2 on chromosome 1, complete sequence
AL359203.22 Human DNA sequence from clone RP11-547P4 on chromosome 1, complete sequence
AL591885.11 Human DNA sequence from clone RP11-655L13 on chromosome 1, complete sequence
AL669841.7 Human DNA sequence from clone RP11-490E15 on chromosome 10, complete sequence
AL392083.17 Human DNA sequence from clone RP11-29A19 on chromosome 10, complete sequence
AL513304.27 Human DNA sequence from clone RP11-398B16 on chromosome 10, complete sequence
AL713851.8 Human DNA sequence from clone RP11-526P5 on chromosome 10, complete sequence

and for "ACxxx":

AC246788.1 Homo sapiens FOSMID clone WI2-3577J19 from chromosome 1, complete sequence
AC240528.3 Homo sapiens FOSMID clone ABC24-2728K21 from chromosome 1, complete sequence
AC023669.8 Homo sapiens BAC clone RP11-432O19 from 7, complete sequence
AC277844.1 Homo sapiens chromosome 1 clone CH17-358B19, complete sequence
AC255393.1 Homo sapiens chromosome 1 clone CH17-319G7, complete sequence
AC160854.2 Homo sapiens chromosome 1 clone WI2-3800L22, complete sequence
AC278970.1 Homo sapiens chromosome 18 clone VMRC59-311P11, complete sequence
AC093563.3 Homo sapiens chromosome 1 clone RP4-648I13, complete sequence
AC213121.2 Homo sapiens FOSMID clone ABC9-43920300K16 from chromosome 1, complete sequence
AC231961.2 Homo sapiens FOSMID clone ABC14-50158200A12 from chromosome 1, complete sequence

From https://www.ncbi.nlm.nih.gov/Sequin/acc.html, I was able to figure out AL and AC are from GeneBank and Ensembl separately and I strongly doubt that those sequences are integrated into Human Reference Genome directly, as when I checked those RefSeq sequences ("NG_xxx") which are surely part of Human Reference Genome:

NG_023366.1 Homo sapiens mediator complex subunit 13L (MED13L), RefSeqGene on chromosome 12
NG_051251.1 Homo sapiens EF-hand domain family member D1 (EFHD1), RefSeqGene on chromosome 2
NG_033919.2 Homo sapiens dedicator of cytokinesis 1 (DOCK1), RefSeqGene on chromosome 10
NG_029776.1 Homo sapiens cell adhesion associated, oncogene regulated (CDON), RefSeqGene on chromosome 11
NG_033883.1 Homo sapiens aquaporin 5 (AQP5), RefSeqGene on chromosome 12
NG_027727.1 Homo sapiens long intergenic non-protein coding RNA 2389 (LINC02389), RefSeqGene on chromosome 12
NG_023441.1 Homo sapiens methionine sulfoxide reductase B3 (MSRB3), RefSeqGene on chromosome 12
NG_052808.1 Homo sapiens transmembrane protein 132D (TMEM132D), RefSeqGene on chromosome 12

Those RefSeq sequences ("NG_xxx" are usually derived from several clone-based sequences ("ACxxx", "ALxxx"), for example, for "NG_023366.1" above: https://www.ncbi.nlm.nih.gov/nuccore/NG_023366.1/

It said "The reference sequence was derived from AC060226.39, AC026334.20, AC130895.2, AC009321.18 and AC012157.21"

My questions: 1. Whether my assumption is correct or not. 2. Where I can find a official document to support my assumption.

bless~

sequence NCBI human reference • 1.1k views
ADD COMMENT
0
Entering edit mode

I cannot answer the question, but I was looking for something similar. There are lots of genes in Ensembl human data that have IDs starting with AC or AL and I'd like to find information about what those stand for.

I couldn't find anything on the Ensembl web site. This link that you provided may be correct. https://www.ncbi.nlm.nih.gov/Sequin/acc.html

ADD REPLY

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6