Polish PacBio assembly with Hi-C reads
0
0
Entering edit mode
5.3 years ago
alex.zaccaron ▴ 470

Hello,

I have a small haploid genome (85 Mb) that was assembled with Canu based on ~100x of PacBio Sequel reads. In addition, a batch of 40 Gbp Hi-C Illumina reads was sequenced to perform scaffolding. The assembly has been polished with Arrow, but there is not a third dataset of Illumina reads to polish with Pilon. I was wondering if I could instead use the Hi-C reads to perform the Illumina polishing step by mapping one or both ends of the reads individually to the assembly. However, given the nature of Hi-C reads, I am a little concerned that the uneven coverage and chimeric reads could have a negative impact. Anyone has previous experience with this approach? Is it a good idea to use Hi-C reads to polish an assembly?

Thanks

assembly sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode

The uneven coverage means polishing will be uneven, with some regions unpolished. As for the chimeric reads, you could use only reads mapping end-to-end to the reference, e.g., using samclip.

ADD REPLY
0
Entering edit mode

Thanks h.mon for the suggestion. Like you pointed out, using only end-to-end mapped reads could still be useful to polish regions of the genome. I will give it a shot and see how it looks.

ADD REPLY
0
Entering edit mode

how did it go, I was thinking the same?

ADD REPLY
1
Entering edit mode

I gave it a shot, but did not move forward with it. Based on the info I gathered, it can be done but there is no guarantee of the results. At the end, we decided to sequence more Illumina data for the polishing step to avoid downstream problems. But I can still describe what I did:

To polish the assembly with Hi-C reads, I mapped both ends individually with bwa mem. After removing unmapped reads, supplementary and secondary alignments with samtools, I removed PCR-duplicated reads with Picardtools. Clipped reads were also removed with samclip, since they are likely chimeric reads.

Using the dataset described above, Pilon confirmed 99% of the bases in the assembly (previously polished with Arrow), and performed 726 changes, of which 88% were correction of single-base INDELs. To me, these numbers suggest that the polishing was successful. Again, we did not move forward with it to avoid downstream problems since this is not a common approach and I have no seen in depth analyses of possible complications.

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6