My metagenomic assembly has low coverage. What upstream steps can I adjust to increase coverage? At what possible cost? Low coverage of metagenomic assemblies must be a common problem, but I've been unable to find a "quick start guide" i.e. an outline of recommendations to remedy this, barring "do more sequencing".
From the top of my head, I am speculating that relaxing the quality cutoff score during the trimming step could help. It would give the assembler more data. Any thoughts on this? Here's my pipeline outline.
reads -> Trimmomatic
(cutoff 30) -> HQ reads -> megahit
-> assembly -> bowtie2
+ samtools
-> bam files, from which I've determined the coverage is low
An idea that incorporates @h.mon's comments! How about
megahit
)bowtie2
) by mapping an increased number of reads by dropping quality cutoff to >Q20My current assembly, made with >Q30 reads, recruits 80% of those reads. Must be a good assembly! Can I keep that but expand coverage with more lower quality reads and rely on the mapper
bowtie2
to score alignments appropriately and output better coverage values? Would this approach add any value to the workflow?I don't have good suggestions regarding what you could / should do. If you relax the alignment too much, mapping rate will increase, but reads may map to the wrong place. Consider, for example, a low-abundance species, with no contigs on your assembly because the reads were filtered. When you map, these reads may now map to another species contigs, and the problem will be worst if you use sensitive (permissive) mapping settings.
I played around with re-trimming. What seems to have made the most difference is the MINLEN parameter. I used to have it set at 135, which I thought was fine for a 250 kit. Changing it to 50 now gives me >90% of reads passing the filter.