Help Understanding PairedEnd read merging
1
0
Entering edit mode
4.9 years ago
m.radz ▴ 10

I had some shotgun sequencing completed on DNA extracted from tissue biopsies with the intention on using them for metagneomic profiling. Each sample is paired end and was split over 4 lanes to maximise output on a NextSeq 500 2x150bp sequencing run.

I ran FastQC on fastq files and they all seem to be good qulity, however I am having issues merging the paired and fastq files. I initially combined all of the R1 and R2 files per sample using cat. I then tried using bbmerge to combine the merged R1 and R2 files as I have done previously, however the percentage of reads merging is only 10-15%, with the vast majority being ambiguous.

I have used this same workflow before on a previous project which was sequenced on a HiSeq using 2x126bp chemistry, and I was able to merge all of the samples with a succesfull rate of roughly 75% of the reads.

Am I doing something wrong? Should I expect differences based on the sequencing chemistry 2x150bp vs 2x126bp or the fact that the latest run was split over 4 lanes an issue?

Edit: This is the insert size info from BBmerge:

Insert range: 35 - 289 90th percentile: 278 75th percentile: 264 50th percentile: 236 25th percentile: 190 10th percentile: 135

next-gen sequencing merge paired • 1.1k views
ADD COMMENT
0
Entering edit mode

Can you merge the individual read pairs and then cat the files? It is possible that if your reads are not in order in merged file then they themselves will not merge properly.

ADD REPLY
0
Entering edit mode

I did try that but I was getting the same merge rate (around 15%)

ADD REPLY
0
Entering edit mode
4.9 years ago
h.mon 35k

Unless the sequencing quality dropped a lot at the end of the 2x150 sequencing runs, the primary factor driving pair merging success is insert size. My guess is the average insert size differs between the libraries, such as in the first library (2x126) most of the pairs had an insert size shorter than 250bp, but in the second library (2x150) most of the pairs had an insert size larger than 300bp.

For most applications (e.g. assembly and mapping), larger insert sizes are better, so in my opinion you should be asking why so many pairs merged in your first library, instead of why so few pairs merged in the second.

ADD COMMENT

Login before adding your answer.

Traffic: 3558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6