Hi there,
I currently have a series of python programs that FASTA files are run through. At the end of the current work flow, the files are given an output displaying the nucleotide distribution. I am mapping the correlation between nucleotide position and gene expression.
Below is a short snippet of the A,T, G, C output
31 , 125066 , 77 , 38
84 , 59 , 35 , 125032
74 , 40 , 6 , 125082
125107 , 44 , 24 , 36
3 , 44 , 4 , 125161
125122 , 23 , 28 , 37
I am now attempting to calculate mutual information in python. I have a program using data frames that gets me to the marginal distribution but I am struggling to access the data from multiple files to sum the information for the MI calculation.
If anyone has any suggestions for FASTA --> MI please let me know!