What is the recommended contig length to filter?
1
0
Entering edit mode
3.4 years ago
DNAngel ▴ 250

I've run my samples through metaspades and am in the process of cleaning out some contigs that I don't need (i.e. obvious contamination). However I don't see (or maybe I missed) anything in the metaspades guidelines about the minimum length it keeps for contigs. In fact I see many contigs of lengths quite short (<100bp). If my target length was 150bp, should I remove contig lengths less than this or in fact increase the minimum contig length to remove?

If anyone has some good documentation on best practise for this I would greatly appreciate it.

Thanks!

metaspades • 2.3k views
ADD COMMENT
3
Entering edit mode
3.4 years ago
Mensur Dlakic ★ 28k

I don't think there is any official guideline regarding the minimum size to keep in metagenomic assemblies. That said, most binning programs will not work with anything smaller than 1000-1500 bp. In case you want justification for that: binning usually works with tetranucleotide (sometimes penta-) frequencies, and there are 256 non-unique 4n combinations. That means that in a 500 bp contig there would be on average only 2 counts of each tetranucleotide, which is really not that much of a signal. Same for 1 kb fragments: there are on average only 4 counts of each tetranucleotide, which also is not very much but it is at least something. The point is that in small fragments there is not enough 4n signal to reliably bin them.

In practice, I take a cutoff length anywhere between 2-5 kb, depending on the number of MAGs, their overlap, depth of coverage and contamination I get after assessing the completeness of bins. Under no circumstances would I ever consider including a contig smaller than 1 kb, although some people do it. Your mileage will vary, but small fragments as a general rule have more noise than signal.

ADD COMMENT

Login before adding your answer.

Traffic: 1134 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6