How To Deal With Heterozygosity/Polymorphisms When Sequencing From Wild Individuals?
1
5
Entering edit mode
13.0 years ago

I'm loosely following the progress of a nematode sequencing project where DNA was extracted from a few hundred individuals from the wild. Now, there are problems with the assembly of the reads, that may be caused by heterozygosity or polymorphisms of the population and potential contamination with another (sub)species.

  • Are there any assemblers that can deal with this?
  • Are there any tools that can look at the set of reads and "diagnose" the degree of heterozygosity?
  • Would, in such a case, paired-end or mate-pair sequencing with longer insert sizes help?

(I'm imagining that there are stretches of sequencing where there are no big differences, but then there will be insertions/deletions that make it impossible to continue with the assembly.)

next-gen sequencing • 3.0k views
ADD COMMENT
2
Entering edit mode

Pooling data is bad for de novo assembly. I doubt any assemblers can handle this well. Even a diploid ciona can cause great troubles when using sanger reads (the paper suggested by Casey Bergman), hundreds of worms sequenced with short reads will be much harder. If it were me, I would wait for cleaner data before doing any serious analysis.

ADD REPLY
2
Entering edit mode

Agreed, better to solve this at the source than try to deal with the mess from a poor sample/experimental design. For example, the D. simulans genome is also a mess because multiple strains were used in the assembly (informally referred to as "Franken-sim" http://bioinformatics.ufl.edu/mclab_public_data/README_updating_fusions.txt)

ADD REPLY
0
Entering edit mode

Pooling data is bad for de novo assembly. I doubt any assemblers can handle this well. Even a diploid ciona can cause great troubles when using sanger reads (the paper given by Casey Bergman), hundreds of worms sequenced with short reads will be much harder. If it were me, I would wait for cleaner data before doing any serious analysis.

ADD REPLY
0
Entering edit mode

Pooling data is bad for de novo assembly. I doubt any assemblers can handle this well. Even a diploid ciona can cause great troubles when using sanger reads (the paper given by Casey Bergman), hundreds of worms sequenced with short reads will be much harder. If it were me, I would wait for cleaner data before doing any serious analysis.

ADD REPLY
0
Entering edit mode

I wouldn't call it poor design, it's just an inherent problem with sequencing small uncultured animals. Still I agree that different wet-lab approaches need to be tried here.

ADD REPLY
4
Entering edit mode
13.0 years ago

Vinson et al (2005) discuss strategies to assembly polymorphic genomes de novo using Sanger sequencing with Arachne: http://www.ncbi.nlm.nih.gov/pubmed/16077012

ADD COMMENT

Login before adding your answer.

Traffic: 1629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6