Question

Subsampling reads

1

Entering edit mode

4.3 years ago

amy ▴ 20

Hi! I hope you can help me, I'm relatively new to bioinformatics.

I plan to use an (already) assembled metagenome data for binning but the microbial species/population that I need to generate from the bins is not well represented in the assembled metagenome data (only at ~3%). Can I use the assembled metagenome for subsampling the reads like at varying percentage of reads coverage? If so, what protocol/software/tools you can recommend? Thank you so much.

metagenome metagenomics microbial • 2.0k views

ADD COMMENT • link 4.3 years ago by amy ▴ 20

0

Entering edit mode

the microbial species/population that I need to generate from the bins is not well represented in the assembled metagenome data (only at ~3%

How is this related to your request for sub-sampling?

You can use reformat.sh for sub-sampling as a general suggestion. Extracting randomly subset of fastq reads from a huge file

ADD REPLY • link 4.3 years ago by GenoMax 154k

0

Entering edit mode

Ideally, the goal is to recover a cyanobacterial genome from the metagenome data. But the population of the cyanobacteria is only at 3% in the sample, Proteobacteria being the most abundant. I've read some articles that did random sub-sampling at different percent reads coverage from a huge metagenome data. Thank you for your response!

ADD REPLY • link 4.3 years ago by amy ▴ 20

score 1 · Answer 1 · 2021-06-06

1

Entering edit mode

4.3 years ago

Mensur Dlakic ★ 30k

If you want to recruit the relevant reads only and re-assemble, mirabait from the MIRA package can do it.

Random sub-sampling in general doesn't work for bins that are of low abundance. You may want to try the normalize-by-median.py approach from the khmer package. Given the low abundance, you may not have enough coverage depth to assemble any better than what you already have.