Question

Strategy to find transgene copy number and integration sites using only Nanopore simplex reads?

1

Entering edit mode

7 months ago

Mark ▴ 60

We've recently transformed an organism and have transgenic strains growing. We suspect some strains have higher copy numbers of the whole genome but we don't know how many copies or where these genes are going. Our lab has a Oxford Nanopore Ligation Sequencing Kit v14 (SQK-LSK114) and this is all we'd like to use to answer this research question.

Here are some facts about the organism.

Our strain has a full T2T reference genome (and it's very simple. Only 16.5Mb; https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000091205.1/)
Our organism is haploid

Does anyone have any recommended strategies for how to do this? Could copy number be solved by using a k-mer spectra analysis (I don't know if it's possible with Nanopore singlex reads due to poor read quality)? Would read mapping to a reference be enough or should some sort of draft assembly be built before mapping? Are there any softwares/pipelines out there that specialize in doing this for nanopore? What would be your guy's recommended strategy for this problem?

sequencing transgenes ngs • 578 views

ADD COMMENT • link updated 6 months ago by colindaven 7.6k • written 7 months ago by Mark ▴ 60

score 0 · Answer 1 · 2024-11-11

Just build an assembly, it should work great using Flye.

Dorado correct might help as well if you have R10.4.1 data and want an extremely high QV assembly.

I don't have any great ideas how to solve your whole genome copy number assembly.

Maybe look at conserved genes and see if they are duplicated and show mutations ?
Do a diploid assembly after dorado correction with hifiasm ?
Align reads to your assembly and check for "heterozygote SNPs" eg with the tool longshot

You should be able to find your integration sites using blastn.