I have recently assembled a genome de-novo. I am looking for fertility genes where there is only 1 copy of the gene in the true genome. I am worried that if there are multiple copies of a gene in the true genome, they will appear as a single gene in the de-novo assembly. Is this a reasonable concern? If so, how would I detect this?
Thank you,
Joe
Edit - I do not have multiple samples, as many CNV detection tools require. I have only the reads (miSeq, hiSeq, and PacBio) and a reference assembly.
Depending on your organism/genome of interest it might be beneficial to work with long reads, such as Oxford Nanopore or PacBio.
Thanks Igor. I noticed that a lot of those tools require multiple samples in order to detect CNV. In my case, I only have reads (PE miseq, PE hiseq, and PacBio) and a de-novo assembly. So my problem is a little different than the typical CNV detection it seems.
Could you expand a bit on which organism/genome you are working? If confidential, perhaps just the size and ploidy will suffice. What coverage do you have with PacBio?
Diploid genome, Heterozygous rate 0.01 - 0.02. Estimated genome size 700 000 000 bases. ~15X PacBio ~65X Illumina coverage
Sounds pretty decent to me, can't judge the quality of your assembly obviously. You could investigate whether the coverage of both (but separately) the illumina and Pacbio reads is evenly distributed over your genome, normalized for GC content, to check for collapsed repetitive elements.
OK! I'll give that a try. Thanks Wouter!
RDXplorer looks like a good one for me to try. Edit - only for human genome, looks like I can't use it.