Hii!
I'm trying to analyze data for my work and trying to read frequency (pwm) matrix from a txt file containing multiple tables to create consensus sequence in short:
A
1872.00 590.00 3339.00 6805.00 0.00 0.00 6805.00 1917.00
1821.00 5138.00 1992.00 207.00 0.00 0.00 0.00 1391.00
2236.00 246.00 1386.00 192.00 0.00 0.00 0.00 2420.00
877.00 1667.00 87.00 0.00 6805.00 6805.00 0.00 1077.00
B
11369.00 11735.00 3157.00 1226.00 26720.00 29957.00 274.00 29221.00 30645.00 30125.00 13752.00 10200.00
6380.00 2568.00 2096.00 26587.00 3312.00 414.00 391.00 761.00 349.00 595.00 5299.00 6905.00
7434.00 8816.00 24214.00 607.00 184.00 1196.00 386.00 999.00 366.00 502.00 5884.00 5934.00
6843.00 8907.00 2559.00 3606.00 1810.00 459.00 30975.00 1045.00 666.00 804.00 7091.00 8987.00
C
1449.00 688.00 4036.00 8832.00 0.00 96.00 8832.00 2770.00
3929.00 5585.00 2483.00 194.00 0.00 0.00 0.00 2369.00
2290.00 103.00 2197.00 1078.00 0.00 0.00 0.00 2417.00
1164.00 3247.00 116.00 66.00 8832.00 8832.00 0.00 1276.00
So I wrote this to read the file at first:
*with open("t.txt") as tx:
for line in tx:
values = line.strip("\n").split("\t")
print(values)*
to get a matrix afterwards. I get the output correctly but when I try to create a matrix all of the ~1200 values merge into 1 matrix but I need every 4 of them to be 1 some kinda like this:
C
1449.00 688.00 4036.00 8832.00 0.00 96.00 8832.00 2770.00
3929.00 5585.00 2483.00 194.00 0.00 0.00 0.00 2369.00
2290.00 103.00 2197.00 1078.00 0.00 0.00 0.00 2417.00
1164.00 3247.00 116.00 66.00 8832.00 8832.00 0.00 1276.00
but instead I get this:
['X']
['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00']
['0.00', '0.00', '0.00', '4.00', '2.00', '10.00', '0.00', '9.00']
['0.00', '0.00', '0.00', '6.00', '8.00', '0.00', '10.00', '1.00']
['10.00', '10.00', '10.00', '0.00', '0.00', '0.00', '0.00', '0.00']
And i can't :). I tried to check lines as rows by writing print(values[0]) but it gave me every 1st one of the lists.
How am I supposed to read a matrix correctly by not getting rows as lists and every matrix to be seperated from each other?
This is not a bioinformatics question, strictly speaking.
You can check if there is only one field, and the length of the content in that field is 1:
(Code is untested and my keywords might be wrong, I don't do a lot of Python).
sorry, i'm using these for transcription factor binding site analysis so i thought it could be counted as bioinformatics :)
didn't work but thanks anyway :)!
Please give us more than "It did not work". What was expected and what actually happened?
I checked online and the keyword to use is
continue
, notnext
. I hope that helps.Your points do not amount to an answer, they are one basic fact and one broad concept. I'm moving this post to a comment.
i tried numpy but never used pandas before. thank you for advice! :)