How to determine appropriate value for abundance parameter?
1
1
Entering edit mode
10.4 years ago

Up to this point I've mostly used the default 3. How would one go about determining what the optimal abundance threshold is for a given dataset. I realize that kmer size selection has the greater effect on assembly quality and I've used kmergenie effectively for this on bacterial isolates in the past.

Any input is appreciated. Thanks.

Minia • 2.4k views
ADD COMMENT
4
Entering edit mode
10.4 years ago
Rayan Chikhi ★ 1.5k

Great question.

In all generality, you want to set an abundance threshold X so that every correct k-mers appear X times or more in the dataset, and not too many erroneous k-mers are seen X times or more. When you take a look at the abundance histogram (generated by Kmergenie or a k-mer counter), a reasonable abundance threshold is near the first "valley" (local minimum) in this histogram.

For high-coverage datasets, the abundance threshold should be high (I can't give a specific number as it depends highly on the dataset but it's generally within the range 5-20). And for low-coverage datasets, 2 or 3 are generally good.

Kmergenie offers an experimental feature that determines an abundance parameter for you. It's not in the HTML report yet, but you can see it in the command line output. Give it a try! I've had good results with it so far.

ADD COMMENT
0
Entering edit mode

Would that be the coverage cut-off metric? Thanks for the quick reply.

ADD REPLY
0
Entering edit mode

Yes, those terms are synonymous: abundance threshold (Minia), coverage cut-off (Kmergenie, Velvet)

ADD REPLY
0
Entering edit mode

Thanks for the clarification

ADD REPLY

Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6