I have 10000 entries as a fasta file that contains nucleotide sequences of length 5.
Eg:- AGGTC, AGCTC, CGCTC, .... 10000 entries.
Now I have plotted the sequence logo plots using the "ggseqlogo" package of R. Now, although I am getting the relative abundance of the bases at each of the five locations(visually), I want to quantify it. Say, location 1 has an over representation of base A by 30% , G by 20% etc.
Is there any other method/package to do that?
or
In ggseqlogo, how do I find the height of each letter and accordingly measure for over representation.
Also sometimes the bases look similar in heights, but I definitely know from literature that they are not equal.