Entering edit mode
3.9 years ago
dbready2
•
0
I have some data from a lab member who sequenced a crispr library plasmid pool that they made. The forward and reverse reads overlap by 6 bp's and I was wondering how I could merge these fastq files together based upon knowing this overlap size. When I use bbmerge or other merging tools, few reads (less than 5%) are merged presumably because of the very short overlap region.
Yes, that is in fact a short overlap, and due to the limited size of overlapping bases the confidence to decide whether the overlap is true and reliable is actually limited. Wouldn't it be simpler to trim one of the reads back a few bases?
Giving this a try a now.
let me try to understand, it is always 6bp?
Yes, the amplicon library they prepared is from a CRISPR library pool in which the only thing that varies is what is contained in the 20 bp gRNA. I inspected a couple read pairs in the fastq to confirm.
could you post 5 sequences or so?
If these are amplicons and you are sure they should overlap then try the following option with
bbmerge.sh
. You will need enough sequence data for this to work. Set this to 10 and see if that works.