Merging paired fastq read files with small overlap region.

0

Entering edit mode

3.9 years ago

dbready2 • 0

I have some data from a lab member who sequenced a crispr library plasmid pool that they made. The forward and reverse reads overlap by 6 bp's and I was wondering how I could merge these fastq files together based upon knowing this overlap size. When I use bbmerge or other merging tools, few reads (less than 5%) are merged presumably because of the very short overlap region.

sequencing alignment • 1.1k views

ADD COMMENT • link updated 3.9 years ago by Biostar 20 • written 3.9 years ago by dbready2 • 0

0

Entering edit mode

Yes, that is in fact a short overlap, and due to the limited size of overlapping bases the confidence to decide whether the overlap is true and reliable is actually limited. Wouldn't it be simpler to trim one of the reads back a few bases?

ADD REPLY • link 3.9 years ago by ATpoint 86k

0

Entering edit mode

Giving this a try a now.

ADD REPLY • link 3.9 years ago by dbready2 • 0

0

Entering edit mode

let me try to understand, it is always 6bp?

ADD REPLY • link 3.9 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Yes, the amplicon library they prepared is from a CRISPR library pool in which the only thing that varies is what is contained in the 20 bp gRNA. I inspected a couple read pairs in the fastq to confirm.

ADD REPLY • link 3.9 years ago by dbready2 • 0

0

Entering edit mode

could you post 5 sequences or so?

ADD REPLY • link 3.9 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

If these are amplicons and you are sure they should overlap then try the following option with bbmerge.sh. You will need enough sequence data for this to work. Set this to 10 and see if that works.

extend=0             Extend reads to the right this much before merging.
                     Requires sufficient (>5x) kmer coverage.

ADD REPLY • link 3.9 years ago by GenoMax 148k

Login before adding your answer.