I am trying to assembly a chloroplast, which closest reference is 150K long. I have 4.5M pairs (2x100nt). This gives me a coverage of 6000X! And my assemblies are horrible (long -2 million bases- compared to reference genome of 150K, and remapping reads vs. contigs only 30% map).
Should I scale my data to 60X using digital normalization or randomly sampling X number of reads?
I took a subset of my data for having 100X and I assembled it with Velvet. When I map all my reads vs my contigs, only 35% of reads map.
What to do with this?
low coverage reads or low coverage kmers?
To be exact reads composed mostly of low coverage kmers. I think bbnorm can perform kmer coverage based read binning quite efficient.