Question

Merge Paired-End Reads

9

Entering edit mode

13.8 years ago

Nicolas Rosewick 11k

Hi,

How can I merge two paired end fastq (R and L) to give a single fastq file ? For information, the sequencing run is 72 bp long and it contains a majority of small RNA (miRNA,...) so a lot of paired end reads will overlap.

For example here's two paired reads :

@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 1:N:0:GCCAAT
CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT
+
IIIIIIIHIIHIIIIIIHHIIIHGIIIIEIIIIIIEIIHIIIIIIIIIIIHIIIIIBHIHIIHGIGIEGHHEGEEH
@HWUSI-EAS529:41:FC62YHFAAXX:8:1:7969:1330 2:N:0:GCCAAT
AGTGCCCTTTCGTAGGATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIDHIGIIIHIIIGHGIIIIIIIHHIHIIIIIIIIIHIIIIIIIIHIIGIIIIHI

I find the adapter in the first one:

Code:

EMBOSS_001         1 CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACGC     50
                                    |||||||||||||||||||||              
EMBOSS_001         1 ---------------TGGAATTCTCGGGTGCCAAGG--------------     21

EMBOSS_001        51 CAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        22 --------------------------     21

but not in the second one

But I effectively found the overlap between the right read and the left read (using the reverse complement of it)

EMBOSS_001         1 --------------------------------------------------      0

EMBOSS_001         1 TTTTTTAATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACA     50

EMBOSS_001         1 -----------CTACGAAAGGGCACTTGGAATTCTCGGGTGCCAAGGAAC     39
                                |||||||||||||||                        
EMBOSS_001        51 GTCCGACGATCCTACGAAAGGGCACT------------------------     76

EMBOSS_001        40 TCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCT     76

EMBOSS_001        77 -------------------------------------     76

So my question is, how can I merge the two fastq files to produce a single fastq file?

Thanks,
N.

fastq • 32k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 13.8 years ago by Nicolas Rosewick 11k

1

Entering edit mode

Hi, I see that you had a similar case like me, so probably you can help me :)

As I always do the miRNA analysis in single end I'm confused how to proceed when I have paired-end? Can you recommend me how to clean the reads and have them ready for analysis, particularly I cannot understand how and what is the relation of the reverse-compliment miRNA sequence in R2 read to the R1 set?

In summary my R1 read is containing 100nt - miRNA+barcode+smallRNAadapter+another adapter+polyA

my R2 is containing miRNA (reversed compliment to R1) + long adapter (or linker) + polyA

Thanks for any help in advance!

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.1 years ago by manekineko ▴ 150

0

Entering edit mode

Hi,

So for an exact answer to this problem.

The R1.fq are the forward reads and the R2.fq are written in reverse-complement.

For example, if I want to create a single file from reads in R1.fq and R2.fq, I have to do "reverse-complement" of reads in R2.fq?

Am I right?

Thank you

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.6 years ago by midox ▴ 290

0

Entering edit mode

no response for this problem?

ADD REPLY • link 9.6 years ago by midox ▴ 290

Ram · Answer 1 · 2011-10-17

2

Entering edit mode

13.8 years ago

pmenzel ▴ 310

Maybe this program is suited for you: http://www.cbcb.umd.edu/software/flash/

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 13.8 years ago by pmenzel ▴ 310

0

Entering edit mode

It's no available, the web page cannot be open. http://genomics.jhu.edu/software/FLASH/index.shtml

ADD REPLY • link 12.7 years ago by litiancheng.gansu ▴ 10

0

Entering edit mode

It's here now: http://ccb.jhu.edu/software/FLASH/

ADD REPLY • link 12.1 years ago by matted 7.8k

Ram · Answer 2 · 2011-10-17

1

Entering edit mode

13.8 years ago

Jeremy Leipzig 23k

There is a decent program called stitch: https://github.com/audy/stitch

I wrote a script called mergePairs that is very sensitive and incredibly slow: http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 13.8 years ago by Jeremy Leipzig 23k

score 1 · Answer 3 · 2013-07-09

1

Entering edit mode

12.1 years ago

lelle ▴ 830

As this was referenced from a duplicate question, I will add a newer tool to the list: PANDAseq

ADD COMMENT • link 12.1 years ago by lelle ▴ 830

Ram · Answer 4 · 2015-03-07

1

Entering edit mode

10.4 years ago

Andreas ★ 2.5k

One more: SeqPrep

Andreas

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Andreas ★ 2.5k

score 1 · Answer 5 · 2023-04-06

1

Entering edit mode

2.3 years ago

Charles-Alexandre Roy ▴ 70

NGmerge (2018) is another option. According to the paper, it performs better than other popular tools like FLASH and PEAR, particularly with respect to the estimation of quality scores for consensus bases.

ADD COMMENT • link 2.3 years ago by Charles-Alexandre Roy ▴ 70

Ram · Answer 6 · 2011-10-17

0

Entering edit mode

13.8 years ago

Stevelor ▴ 310

Either use Galaxy or use the single scripts

http://hg.notalon.org/galaxy/galaxy-central/src/7d9bb95caaa7/tools/fastq

HTH!

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 13.8 years ago by Stevelor ▴ 310

1

Entering edit mode

uhhh which script?

ADD REPLY • link 13.8 years ago by Jeremy Leipzig 23k

Ram · Answer 7 · 2015-03-07

If you wish to trim adapters and merge in a single step, you can the leeHom, we use it mainly to reconstruct ancient DNA sequences but it has broader uses as well:

http://nar.oxfordjournals.org/content/42/18/e141

Click here for the Website of the repository

It use a Bayesian maximum a posteriori approach that considers quality scores for both the adapter determination and the merging part.

score 0 · Answer 8 · 2017-08-08

0

Entering edit mode

8.0 years ago

FatihSarigol ▴ 260

There is BBMerge, which is ""designed to merge two overlapping paired reads into a single read. For example, a 2x150bp read pair with an insert size of 270bp would result in a single 270bp read"": http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmerge-guide/

ADD COMMENT • link 8.0 years ago by FatihSarigol ▴ 260