Question

Suggestion on contig assembler which respect a single copy mutation in a nonaploid genome?

0

Entering edit mode

7.1 years ago

johnnytam100 ▴ 110

As title, and the following is the background FYI :

I am discovering a possible single copy insertional mutation in a nonaploid plant genome.

Previously I used SPades to assemble contigs of the mutant using reads containing the potentially mutated sequence, which then has given me a ~1000 contigs of the mutant genome.

While ~10 contigs containing breakpoints seems to have been discovered after mapping of reads (0 mismatch) to the set of 1000 contigs, after PCR screening, they all are false-positive.

Then I redo the mapping again using default no. of mismatch set by bwa, discovering reads with mismatches corresponding to the breakpoints actually joined the breakpoint and resulted in no difference between the wild-type and the mutant mapping result.

After that, I started to consider if the assembler "generalize" my contigs too much that makes the breakpoint difficult to be discovered by observing the difference between the wild-type and the mutant.

That comes to my question: is there any contig assemblers which respect a single copy mutation in a nonaploid genome? The situation is discovering the 1 in the 1:8 situation in the genome.

Thank you!

Assembly insertion deletion indel • 1.7k views

ADD COMMENT • link 7.1 years ago by johnnytam100 ▴ 110

0

Entering edit mode

Which dataset ? Pacbio or Illumina ? Coverage ? If Illumina I would say this is impossible.

Assemblers are not really up to the task of generating diploid assemblies yet, with a few exceptions like Falcon-unzip

ADD REPLY • link 7.1 years ago by colindaven 7.0k

0

Entering edit mode

I used 150bp illumina library... coverage of wildtype is 11 and mutant is 38. Do you suggest getting some long reads anyway?

ADD REPLY • link 7.1 years ago by johnnytam100 ▴ 110

0

Entering edit mode

cov 38 is potentially useful, 11 not so much. I would expect many fragmented assemblies even if just haploid/diploid. This data is not sufficient for your goals.

If you are interested in one region can't you generate a whole range of PCR products and or preferably BACS and sequence those with a long read tech ? PAcbio or ONT ?

This is a very difficult project though. Is there any successful public data from the same organism ?

ADD REPLY • link 7.1 years ago by colindaven 7.0k

0

Entering edit mode

There is one example but is for a diploid relative species and I think there is no projects working on nonaploid of this species...

Let me ask if I could do some long reads... More suggestions on wet or dry experiments would be much appreciated!

ADD REPLY • link 7.1 years ago by johnnytam100 ▴ 110