Entering edit mode
4.1 years ago
nameuser
▴
30
This question has been removed from this site -- please see stackoverflow if interested.
Previous content restored by Ram from Google Cache
Hi there,
I'm currently trying to write a program that will calculate the mutation rate given text files of nucleotide distributions. I am hoping to automate the process of calculating mutual information in Excel to python. I'm stuck at this step in the calculation.....
An example of an input file is as follows
A,T,G,C
84 , 59 , 35 , 125032
74 , 40 , 6 , 125082
125107 , 44 , 24 , 36
3 , 44 , 4 , 125161
125122 , 23 , 28 , 37
5 , 23 , 4 , 125180
125149 , 8 , 18 , 37
125124 , 32 , 14 , 38
9 , 25 , 8 , 125170
The program:
import pandas as pd
import sys
filename = sys.argv[1]
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
col = ['A', 'T', 'G', 'C']
df = pd.read_csv(filename, skipinitialspace=True, usecols=col)
df.head(287)
df['max'] = df[['A', 'T', 'G', 'C']].max(axis=1)
df['sum'] = df[['A', 'T', 'G', 'C']].sum(axis=1)
df.loc[:,"A":"C"] = df.loc[:,"A":"C"].div(df["sum"], axis=0)
df['mutation_rate'] = (1-df['max']/df['sum'])
df['max2'] = df[['A', 'T', 'G', 'C']].max(axis=1)
df['sum2'] = df[['A', 'T', 'G', 'C']].sum(ax
is=1)
df['marginal_distribution']=(1-df['max2']/df['sum2'])
df.head()
df.head()
numberOfBins = sys.argv[2]
df['A/numberOfBins'] = df['A'].div(8)
df['T/numberOfBins'] = df['T'].div(8)
df['G/numberOfBins'] = df['G'].div(8)
df['C/numberOfBins'] = df['C'].div(8)
df.head()
With the output
A T G C
0 0.000671 0.000471 0.00028 0.998578
1 0.000591 0.000319 0.000048 0.999042
2 0.999169 0.000351 0.000192 0.000288
3 0.000024 0.000351 0.000032 0.999593
4 0.999297 0.000184 0.000224 0.000296
5 0.00004 0.000184 0.000032 0.999744
6 0.999497 0.000064 0.000144 0.000295
7 0.999329 0.000256 0.000112 0.000303
8 0.000072 0.0002 0.000064 0.999665
max sum mutation_rate
125032 125210 0.001422
125082 125202 0.000958
125107 125211 0.000831
125161 125212 0.000407
125122 125210 0.000703
125180 125212 0.000256
125149 125212 0.000503
125124 125208 0.000671
125170 125212 0.000335
max2 sum2
0.998578 1
0.999042 1
0.999169 1
0.999593 1
0.999297 1
0.999744 1
0.999497 1
0.999329 1
0.999665 1
marginal_distribution
0.001422
0.000958
0.000831
0.000407
0.000703
0.000256
0.000503
0.000671
0.000335
A/numberOfBins T/numberOfBins G/numberOfBins C/numberOfBins
0.000084 0.000059 0.000035 0.124822
0.000074 0.00004 0.000006 0.12488
0.124896 0.000044 0.000024 0.000036
0.000003 0.000044 0.000004 0.124949
0.124912 0.000023 0.000028 0.000037
0.000005 0.000023 0.000004 0.124968
0.124937 0.000008 0.000018 0.000037
0.124916 0.000032 0.000014 0.000038
0.000009 0.000025 0.000008 0.124958
I am attempting to solve for Shannon entropy/Mutual information. Thank you SO much.
In your loop:
Edit: Note: the text (esp. the code) of the question appears to have changed since the initial posting, so this comment doesn't seem to make sense any more.
Hello nameuser,
Do not redact content after you've received feedback on a post. This is inconsiderate and such behavior can lead to suspension of your user account.
Please point to the StackOverflow post that you are referring to. In the meantime, I'll be restoring the content of this post from Google Cache.