Hi! I hope you can help me, I'm relatively new to bioinformatics.
I plan to use an (already) assembled metagenome data for binning but the microbial species/population that I need to generate from the bins is not well represented in the assembled metagenome data (only at ~3%). Can I use the assembled metagenome for subsampling the reads like at varying percentage of reads coverage? If so, what protocol/software/tools you can recommend? Thank you so much.
Ideally, the goal is to recover a cyanobacterial genome from the metagenome data. But the population of the cyanobacteria is only at 3% in the sample, Proteobacteria being the most abundant. I've read some articles that did random sub-sampling at different percent reads coverage from a huge metagenome data. Thank you for your response!
If you want to recruit the relevant reads only and re-assemble, mirabait from the MIRA package can do it.
Random sub-sampling in general doesn't work for bins that are of low abundance. You may want to try the normalize-by-median.py approach from the khmer package. Given the low abundance, you may not have enough coverage depth to assemble any better than what you already have.
How is this related to your request for sub-sampling?
You can use
reformat.sh
for sub-sampling as a general suggestion. Extracting randomly subset of fastq reads from a huge fileIdeally, the goal is to recover a cyanobacterial genome from the metagenome data. But the population of the cyanobacteria is only at 3% in the sample, Proteobacteria being the most abundant. I've read some articles that did random sub-sampling at different percent reads coverage from a huge metagenome data. Thank you for your response!