I think I have a 100kb repeat in a genome. I want to prove it but don't know if assembly of Illumina data will merge duplicated region into just one.
I think I have a 100kb repeat in a genome. I want to prove it but don't know if assembly of Illumina data will merge duplicated region into just one.
In theory, a Moleculo-type library with Illumina sequencing could distinguish the two repeats. In practice, many users have had limited success with this approach.
Personally, I would recommend PacBio sequencing for an unambiguous answer. There are also a host of new (e.g., optical mapping) and old (Southern blotting) techniques better suited to your task than short-read sequencing.
That may be so but repeats may still have sequence variation/internal rearrangements that will only become evident after you locate/investigate it.
In one of the answers above you are saying that this is a draft genome so how are you sure that there are indeed 2 copies? Is the draft reasonably "finished" (a single or a small number of contigs)?
The assembly will merge it into one. Then, when you align the reads back to the assembled genome, that region will have twice as much coverage. You can use one of many CNV analysis tools to detect that.
See this paper for a CNV analysis overview: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4394692/
Some previous discussion: Running 1.5M potentially different generalized linear models depending on distribution of read depth information to study CNV
The assembly will merge it into one.
Only if the repeat is perfect. For a large region such as this that is/seems unlikely. Having twice as much coverage for the regions that are common may work.
Wonder if doing some old fashioned combination restriction digestions may work if OP can find enzymes that cut sparingly in the region to first prove that there is indeed a repeat present.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you have right now? Only reads?
See the papers:
HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2515-7
Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4173024/
I detected it mapping reads back against assembled draft genome.