Dear all,
I have fasta seguence like this:
CCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCCCTTTT
I would like to calculate the complexity of this sequence based on sliding window of 3-4 nucleotides, and plot a graph with a score or something similar, in order to identify regions like "CCCC"
,"TTTT"
,"AAAA"
.
Is there a software that does something similar? OR alternatively how can I do this?
you need to mention window size and step size. In addition, please post how you would calculate the complexity. If you want to generate multiple sequences with defined window and step sizes from a single sequence,
seqkit window
function would help.window size of 3-4 nucleotides and step of 1 nucleotide. I am not sure how to calculate the complexity, it is like I want to find and homopolymers in the sequence and plot them on a graph, so if there are homopolymers I would like to see a pick.