Question

How to calculate the entropy of a DNA sequence - and/or it's complexity

1

Entering edit mode

5.3 years ago

alorsonmethyle ▴ 50

Hi,

I am trying to propose a way to show that a given sequence, given to another one is less/more complex. I would like to explain by this that some mapping issues can come from there. I'd like to show it with a 4 letters genome, and a 3 letters genome (bisulfite converted).

I heard that shannon's entropy can help me in that, but I am actually not very sure. 1) it seems it works fine to find motifs, to find what's possibly common when comparing sequences (https://bioinformatics.stackexchange.com/questions/9091/why-do-ten-rows-figure-1-correspond-to-2-bits-figure-2-in-a-sequence-logo/9094#9094) and I think I quite understand how it is calculated.

2) I have found some formulas and calculator to calculate a general entropy ( http://www.shannonentropy.netmark.pl/ ) that is interesting and it may help me.

3) I was, however, thinking that maybe I could calculate an entropy factor for a given sequence regarding repeated motifs it may have ( like in this paper http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.1231&rep=rep1&type=pdf )

4) finally, but I think I can't find it back, I would have been in search for a 'by position' entropy, that would show a decrease of complexity in some parts of my sequences. It seems the package HDMD can help me, but again, I need to "compare" different sequences to have en entropy score.

Alternatively, if it's a bad way to assess complexity of sequencing (related to mapping), would you recommend something else?

Best,

WGBS DNA R genome • 6.0k views

ADD COMMENT • link updated 5.2 years ago by onestop_data ▴ 330 • written 5.3 years ago by alorsonmethyle ▴ 50

score 1 · Answer 1 · 2020-02-12

1

Entering edit mode

5.2 years ago

onestop_data ▴ 330

Great question. Shannon Entropy should do the job to compute complexity for a DNA string. Here I share a script to do it.. I hope it helps.

ADD COMMENT • link 5.2 years ago by onestop_data ▴ 330