Question

What kind of advantage does PEAR bring?

0

Entering edit mode

7.5 years ago

CY ▴ 750

I have not had a chance to try PEAR.

I assume some people would like to merge PE reads into SE reads when most of the inset size is less than 2*read length.

1) How PEAR treat un-overlapped PE reads? discard them?

2) What kind of advantage PEAR brings? I imagine the mapping accuracy won't change much. Structural variants are less accurately identified without pairing information. I can't imagine anything good out of this implementation...

pear paired end single end • 2.8k views

ADD COMMENT • link updated 7.5 years ago by h.mon 35k • written 7.5 years ago by CY ▴ 750

score 1 · Answer 1 · 2017-09-21

1

Entering edit mode

7.5 years ago

h.mon 35k

1) How PEAR treat un-overlapped PE reads? discard them?

-o <str>    Specify the name to be used as base for the output files. PEAR outputs four files. A file containing the assembled reads with a assembled.fastq extension, two files containing the forward, resp. reverse, unassembled reads with extensions unassembled.forward.fastq, resp. unassembled.reverse.fastq, and a file containing the discarded reads with a discarded.fastq extension.

2) What kind of advantage PEAR brings?

The end of the reads generally has lower quality, by merging two lower quality ends, you increase the overall confidence for the overlapped bases. Merging reads is useful for processing amplicons shorter than sum of reads (imagine 16S metagenomics). Also some people claim it is also useful for assembly.

ADD COMMENT • link 7.5 years ago by h.mon 35k

0

Entering edit mode

Also some people claim it is also useful for assembly.

:) Depends on the assembler, but yes, it can be very useful. It's also useful for identifying longer insertions when calling variants.

ADD REPLY • link 7.5 years ago by Brian Bushnell 20k

0

Entering edit mode

This is useful for identifying longer insertion only because the insert size is shorter than the sum of paired reads, right? Otherwise, The paired end reads plus the known insert size would provide more information when identifying long insertion. Am I right?

ADD REPLY • link 7.5 years ago by CY ▴ 750

1

Entering edit mode

Specifically, a normal aligner can only call insertions in cigar strings if the insertion is shorter than read length, and a variant caller based on cigar strings will only call variants recorded there by the mapper. So the longer the reads, the longer the insertions that can be called. When paired reads have an insert size longer than read length but less than double read length, they can be merged by the overlap to produce a single longer read that allows longer insertions to be called.

ADD REPLY • link 7.5 years ago by Brian Bushnell 20k

0

Entering edit mode

I guess in best case the insert size should longer than the sum of paired end reads so that the insert size can server as additional information when calling structure variant (longer insertion).

When the interested region (amplicon) is short and insert size can't provide additional information, merging them into SE would be an alternative choice, right