Paired-End Overlapping With Error Correction - Which Is Better?
3
2
Entering edit mode
11.2 years ago
Rohit ★ 1.5k

Hello.

I have recently started working on a new project of de-novo assembly. But I want to have some good pipeline to start.

I have to over-lap my reads and also filter the reads according to the quality. I was thinking of using FLASH or COPE for overlapping , with slight preference to COPE as I have used FLASH before (just want to try a new tool).

But I have to do Error correction too and am planning to use MUSKET for it.

My questions are 1) Do I use Musket for error correction before I try to overlap with COPE (or) Do I overlap the reads first with COPE and then use Musket?

2) Is FLASH still better than COPE for overlap or do you suggest anything else?

ngs genome • 4.6k views
ADD COMMENT
3
Entering edit mode
11.2 years ago
Rohit ★ 1.5k

Since I used the tools I mentioned before, these are the results....

First is Error Correction is to be done before Merging. And Musket performs really good for the correction.

For merging, Flash gave one-third more merged reads than Flash when using the same overlap values. But one-fourth of Flash merged reads were shorter than the length(read+ overlap) cut-off. COPE had more better quality overlaps and higher number of longer overlaps compared to Flash. COPE was more precise as it uses the quality cut-off and ambiguity removal too.

When error correction was done before merging, there was an increase by one-fifth in the number of merged reads than without correction.

I worked on primates data and this is how I can conclude based solely on my results,

1) don't Trim just Error-correct (if your read data is not too bad) and 2) error-correct then merge

ADD COMMENT
0
Entering edit mode

thanks for following up, interesting observation on error correcting instead of trimming, I will try that out as well

ADD REPLY
2
Entering edit mode
11.2 years ago
rtliu ★ 2.2k

For question 2), I would suggest using abyss-mergepairs:

ABySS 1.35 included a new program "abyss-mergepairs", source code https://github.com/bcgsc/abyss/blob/master/Align/mergepairs.cc

The program was described in white spruce genome Bioinformatics paper :

"2.4 Read merging Reads from the HiSeq 2000 PET 250 bp libraries and the MiSeq PET 500 bp libraries were merged using abyss mergepairs (Supplementary Fig. S3). This utility performed a pair-wise Smith Waterman overlapped alignment (Smith and Waterman, 1981) between reads pairs, and selected the best quality base where alignments returned mismatching bases. An arbitrary base was selected when qualities were identical. In cases of read-to-read alignment ambiguity, read pairs were not merged."

ADD COMMENT
0
Entering edit mode

Have the results been compared to the other existing tools... How different are the results from them?

ADD REPLY
0
Entering edit mode

I have not done any comparison, but I trusted ABySS authors.

ADD REPLY
0
Entering edit mode
11.2 years ago

I haven't used this combination of tools yet but my gut says that since the overlap relies on the sequences matching you'd get better results if you corrected first and combined in the second step.

But the gut can be wrong, so maybe the best would be to try both and tell us what turned out to be better :-)

ADD COMMENT
0
Entering edit mode

Will I be able to tell which way was better only after I am done with the assembly by comparing the N50 values and number of contigs, or do you think there is a checkpoint somewhere in the middle?

ADD REPLY
0
Entering edit mode

Not quite sure. You may be able to see that from the number of reads that you can merge successfully.

ADD REPLY
0
Entering edit mode

But don't you think that there can be chances of more merged reads if there is data error, and if I go for more data correction then chances of more data loss. But I think I have to try both methods with my data but can't probably be sure which worked better. Vicious circle of Quality filter I guess :(

ADD REPLY

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6