I generated a new transgenic mouse through random multi-copy integration of a 10.316 Kb DNA fragment with a known sequence . We performed WGS using the PromethION flow cell. From the core I received 500+ fastq files that I have subsequently merged.
So far I have performed:
de novo assembly with Flye to produce a denovo_assembly.fasta file with all the haplotypes using
flye --nano-raw \
merged.fastq.gz \
--genome-size 2.6g \
--keep-haplotypes \
--scaffold \
--threads 128 \
--out-dir ./flye_output_haplo
I then aligned my insert.fasta
file to the denovo_assembly.fasta
using
minimap2 -t 196 denovo_assembly.fasta insert.fasta > alignment.sam
which produced the following sam output
Insert 10316 2 10311 + contig_1719 4057986 20474 30784 10230 10329 0 tp:A:P cm:i:1848 s1:i:10223 s2:i:10218 dv:f:0.0017 rl:i:0
Insert 10316 19 10311 + contig_1719 4057986 10169 20468 10225 10312 0 tp:A:S cm:i:1858 s1:i:10218 dv:f:0.0013 rl:i:0
Insert 10316 60 10207 - contig_1719 4057986 9 10159 10074 10166 0 tp:A:S cm:i:1827 s1:i:10068 dv:f:0.0014 rl:i:0
From this I gathered that my insert is on contig_1719
of my denovo assembly
To identify what chromosome contig_1719
belongs to I performed the following
minimap2 -cx asm5 -t196 --cs GRCm39.primary_assembly.genome.fa.gz denovo_assembly.fasta > asm.paf
paftools.js call asm.paf > var.txt
This generated a text file of all the assembly contigs relative to the reference
V chr3 42255839 42255840 1 60 a - contig_1719 3151947 3151947 +
I then printed just the portion of the denovo_assembly.fasta corresponding to contig_1719
using
awk '/contig_1719/{x=NR+68000}(NR<=x){print}' denovo_assembly.fasta > contig_1719.txt
And uploaded it into benching where I then performed auto annotation to mark my insert
The above workflow was successful, in that contig_1719
does contain approximately 4 (3 complete and one partial) copies of my insert based on searching pieces of the insert sequences in the contig_1719.txt
file
However, the outlined workflow is a novice attempt to identify the location as well as the copy number of my insert. If someone with more experience with this procedure or just nanopore sequencing in general could provide recommendations for how to improve the workflow or share their work flow it would be greatly appreciated.