Question

How to explain fastq insert size peak

0

Entering edit mode

3.7 years ago

Michael ▴ 270

How do you explain the following insert-size peak? Novaseq PE 150bp. Insert size estimation by fastp. I assume it is an artifact from fastp's insert size estimation, not sure how this happens though.

It is exactly at the read length of 151 bp (fastp runs combined with MultiQC):

enter image description here

RNAseq trimming • 5.1k views

ADD COMMENT • link updated 3.7 years ago by GenoMax 147k • written 3.7 years ago by Michael ▴ 270

0

Entering edit mode

I am not a fastp user so don't know how it is calculating insert size. My assumption would be by overlapping R1/R2 reads. ~~So that peak may represent reads that overlap by just one bp.~~

If you were interested in calculating the insert sizes then you could also try BBMap suite: C: Target fragment size versus final insert size

ADD REPLY • link 3.7 years ago by GenoMax 147k

1

Entering edit mode

I think it is not as you described. fastp will calculated overlaps form R1/R2 pairs. But I think it will just go down to about 30bp overlap. For fewer bases the risk of overlap by chance occurs in repetitive regions.

So I think the peak is where R1 and R2 are pretty much exactly aligning start to end. But still I would not see where the peak comes from.

EDIT: given that we have low percentages on the Y-Axis the peak it not that extreme. I still want to understand how this happens...

ADD REPLY • link 3.7 years ago by Michael ▴ 270

1

Entering edit mode

R1 and R2 are pretty much exactly aligning start to end

Thinking about this again that makes sense.

Fastq example report page says this

This estimation is based on paired-end overlap analysis, and there are 3.771313% reads found not overlapped. The nonoverlapped read pairs may have insert size <30 or >272, or contain too much sequencing errors to be detected as overlapped.

Even in the fastp example report there appears to be a peak at the same location as yours. MultiQC seems to be exaggerating the Y-scale a bit.

ADD REPLY • link 3.7 years ago by GenoMax 147k

0

Entering edit mode

Thanks! I saw that on the fastp example page. But I still do not get how this is happening... :(

ADD REPLY • link 3.7 years ago by Michael ▴ 270

1

Entering edit mode

Likely an artifact as you originally said. If you have a dataset of different sequencing length see if the peak shifts accordingly.

ADD REPLY • link 3.7 years ago by GenoMax 147k