I have a DNA library encoding scFv antibody genes that consist of VH gene (~380 bp) + linker peptide (54 bp) + VL gene (~380 bp), and the library contains about 1M unique antibodies. We've performed some iterative selections on the library such that the final sample has an expected diversity of about 500-5K unique antibodies. However, we want to sequence each round of selection, starting with the unselected diverse library, and use the enrichment of unique sequences across the rounds of selection to inform some future experiments.
Previously, our libraries consisted of VH genes only and 2x250 bp NovaSeq runs worked really nicely, giving us the coverage and depth that we needed especially in the early rounds of selection when diversity is still high. However, this new library contains inserts of about 800-900 bp and I don't know how to go about sequencing it.
The read depth is super important for calculating the fold-enrichment during selection. The linker sequence between the VH and VL sequences should be invariant, and we thought about sequencing the VH and VL domains separately, but I'm not sure how to re-assemble which VH domain goes with which VL domain and this is biologically necessary.
Does anyone have any suggestions to balance coverage with read depth? Thank you in advance!!
I had no idea the platform was capable of 2x300. I will recommend JGI starts doing that.
Sorry - total lapse of cerebral function. 2x250 on the NovaSeq. 2x300 on the MiSeq. We're not doing anything fancy there!
Merging paired reads is a good idea. Then you get nice, long reads... and actually, as long as you have enough coverage, you can just merge all of them.
BBMerge and Tadpole do allow you to extend reads (via the extendleft and extendright flags) which often allows them to overlap, so you can merge distant read pairs. But I am wondering about your post:
It does not seem to be related to your initial question.