I would like to request some help. We isolated some mutants with interesting phenotypes serendipitously when trying to generate mutants by targeted integration of a deletion cassete in a yeast. We confirm by PCR that the integration of the deletion cassete was ectopic and not in the gene of interest. To identify the chromosomal location of the deletion cassete, we used Illumina DNA sequencing for each of the ectopic insertion mutants (40 million reads 100 nt paired). Now, I am trying to identify the insertion site. My first approach was de novo assembly. However, I could not link the selection marker (transgene) to homologous DNA sequences. Even with relative good assembly statistics (genome size is about 18 mb).
So, I was wondering if I can use a SNV predictor to identify, any advices? I have tried BASIL-anise, but cannot identify among the insertion sequences the selection marker. As the selection marker is flanked by homologous sequences, these may be duplicated. If so, the insertion site could be tracked by the presence of duplicated sequences (near 1kb). Is my rationale ok? Any advices to find such large insertion sequences (deletion cassete is about 9 kb) or duplicates sequences (homologous sequences are about 1 kb). Thank you advice. Best, Charley
Strain 1 quast report
# contigs (>= 1000 bp) 98
# contigs (>= 5000 bp) 83
# contigs (>= 10000 bp) 76
# contigs (>= 25000 bp) 72
# contigs (>= 50000 bp) 61
Total length (>= 0 bp) 17460253
Total length (>= 1000 bp) 17329730
Total length (>= 5000 bp) 17296753
Total length (>= 10000 bp) 17237993
Total length (>= 25000 bp) 17174558
Total length (>= 50000 bp) 16764377
# contigs 116
Largest contig 716379
Total length 17342266
GC (%) 47.84
N50 394827
N90 114158
auN 379622.1
L50 17
L90 50
# N's per 100 kbp 0.00
Strain 2 quast report
# contigs (>= 1000 bp) 111
# contigs (>= 5000 bp) 92
# contigs (>= 10000 bp) 84
# contigs (>= 25000 bp) 75
# contigs (>= 50000 bp) 62
Total length (>= 0 bp) 17585912
Total length (>= 1000 bp) 17386693
Total length (>= 5000 bp) 17343986
Total length (>= 10000 bp) 17285446
Total length (>= 25000 bp) 17126139
Total length (>= 50000 bp) 16614740
# contigs 149
Largest contig 778826
Total length 17411279
GC (%) 47.82
N50 360176
N90 115089
auN 382490.0
L50 17
L90 51
# N's per 100 kbp 0.00
Just an update. I will try to identify using SoftSV and perSVade. As soon as I get the results, I will post here. Best, Charley