Crazy coverage in assembly of chloroplast
2
0
Entering edit mode
9.0 years ago
int11ap1 ▴ 490

I am trying to assembly a chloroplast, which closest reference is 150K long. I have 4.5M pairs (2x100nt). This gives me a coverage of 6000X! And my assemblies are horrible (long -2 million bases- compared to reference genome of 150K, and remapping reads vs. contigs only 30% map).

Should I scale my data to 60X using digital normalization or randomly sampling X number of reads?

I took a subset of my data for having 100X and I assembled it with Velvet. When I map all my reads vs my contigs, only 35% of reads map.

What to do with this?

assembly chloroplast plants coverage • 3.3k views
ADD COMMENT
1
Entering edit mode
9.0 years ago
thackl ★ 3.0k

The high coverage is not unusual for chloroplasts in plant data. Random sampling could work. But I would additionally run a filter on the sampled set to remove low coverage reads, something like quake for example - this will remove most of the "genomic contamination". Then assembly it with SPAdes rather than Velvet. You can further analyse the scaffolds with Bandage and extract the cluster connecting the chloroplast and filter you contig set further. Even with good data, it will usually not assemble into a single contig, but at least 3 contigs, one for LSC, one for SSC and one copy of the inverted repeat.

ADD COMMENT
0
Entering edit mode

low coverage reads or low coverage kmers?

ADD REPLY
0
Entering edit mode

To be exact reads composed mostly of low coverage kmers. I think bbnorm can perform kmer coverage based read binning quite efficient.

ADD REPLY
0
Entering edit mode
8.9 years ago

Hi, I developed a new assembler for plastids and it should assemble the chloroplast genome in one circular contig. I will upload the assembler in the next few weeks if you would be interested: https://github.com/ndierckx/NOVOPlasty

I could already upload a beta version next week, probably some bugs, but all tests were successful. I assembled 10 chloroplasts, all in one contig and within 30 min. The high coverage is no problem for this assembler and you don't need any reference. For the paper I assembled the chloroplast of Arabidopsis and rice, they were both 100 % accurate, so you should obtain a high quality assembly. But I would recommend to subsample the file a bit because 6000X is a lot :) It will slow down the assembly and require more memory... I can send a script for it..

ADD COMMENT

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6