I have Illumina data of a 6Mb bacteria. Insert length = 400, read length = 151, estimated coverage = 320 (read length * #sequences / 6Mb). I think I donĀ“t need to use all the data. I could randomly sample 25% or so. But I could also filter sequences very stringent based on quality. Do you think it would improve my assembly?
I want to assemble high quality contigs. There are many repetitive sequences that I probably need to identify myself afterwards.