can someone explain for me that why there are a few gaps in reference genome??
thanks!
can someone explain for me that why there are a few gaps in reference genome??
thanks!
The simple answer is that certain stretches of a genome contain sequence that is difficult to sequence, mainly due to repetitive regions, tracks of the same base, GC composition, closed DNA, etc. Searching Google for "gaps genome" brings up a whole host of references about the causes of gaps and the attempts to close them.
User deanna.church gave this answer for a similar question related to mouse genome on biostar.
The Genome Reference Consortium (http://genomereference.org) attempts to model biological gaps in the assemblies that we produce. Unfortunately, in the current assemblies, the models for both centromeres and telomeres are rather poor so they just consist of a run of Ns. We don't have good estimates of mouse telomere/centromere size, so we use a default of 3M Ns for these regions. This information is marked up in the AGP files that define the assembly: mouse: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Mus_musculus/GRCm38.p1/ human: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p11/
Note: even within the euchromatic regions there can be long runs of Ns representing gaps that we can't fill yet. In many cases we do have a good size estimate for the gap- typically based on experimental evidence like comparison to an optical map. For human, the problem is that some of the euchromatic gaps are polymorphic, so the size of the gap really depends on the individual you are assessing.
Post: No reads ever map to first 3Million bases of chromosomes in mouse Genome! Why?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You got to be a lot more specific. What do you mean by gaps? Long sequences of N's? Gaps when you align your gene to the reference? What genome are you looking at?
Long sequences of N. thanks!