Question

pwm matrix from txt file python

0

Entering edit mode

5.3 years ago

vschultz • 0

Hii!

I'm trying to analyze data for my work and trying to read frequency (pwm) matrix from a txt file containing multiple tables to create consensus sequence in short:

A
1872.00 590.00  3339.00 6805.00 0.00    0.00    6805.00 1917.00
1821.00 5138.00 1992.00 207.00  0.00    0.00    0.00    1391.00
2236.00 246.00  1386.00 192.00  0.00    0.00    0.00    2420.00
877.00  1667.00 87.00   0.00    6805.00 6805.00 0.00    1077.00
B
11369.00    11735.00    3157.00 1226.00 26720.00    29957.00    274.00  29221.00    30645.00    30125.00    13752.00    10200.00
6380.00 2568.00 2096.00 26587.00    3312.00 414.00  391.00  761.00  349.00  595.00  5299.00 6905.00
7434.00 8816.00 24214.00    607.00  184.00  1196.00 386.00  999.00  366.00  502.00  5884.00 5934.00
6843.00 8907.00 2559.00 3606.00 1810.00 459.00  30975.00    1045.00 666.00  804.00  7091.00 8987.00
C
1449.00 688.00  4036.00 8832.00 0.00    96.00   8832.00 2770.00
3929.00 5585.00 2483.00 194.00  0.00    0.00    0.00    2369.00
2290.00 103.00  2197.00 1078.00 0.00    0.00    0.00    2417.00
1164.00 3247.00 116.00  66.00   8832.00 8832.00 0.00    1276.00

So I wrote this to read the file at first:

*with open("t.txt") as tx:
        for line in tx:
            values = line.strip("\n").split("\t")
            print(values)*

to get a matrix afterwards. I get the output correctly but when I try to create a matrix all of the ~1200 values merge into 1 matrix but I need every 4 of them to be 1 some kinda like this:

C
1449.00 688.00  4036.00 8832.00 0.00    96.00   8832.00 2770.00
3929.00 5585.00 2483.00 194.00  0.00    0.00    0.00    2369.00
2290.00 103.00  2197.00 1078.00 0.00    0.00    0.00    2417.00
1164.00 3247.00 116.00  66.00   8832.00 8832.00 0.00    1276.00

but instead I get this:

['X']
['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00']
['0.00', '0.00', '0.00', '4.00', '2.00', '10.00', '0.00', '9.00']
['0.00', '0.00', '0.00', '6.00', '8.00', '0.00', '10.00', '1.00']
['10.00', '10.00', '10.00', '0.00', '0.00', '0.00', '0.00', '0.00']

And i can't :). I tried to check lines as rows by writing print(values[0]) but it gave me every 1st one of the lists.

How am I supposed to read a matrix correctly by not getting rows as lists and every matrix to be seperated from each other?

python matrix array numpy • 2.0k views

ADD COMMENT • link updated 5.3 years ago by shoujun.gu ▴ 350 • written 5.3 years ago by vschultz • 0

0

Entering edit mode

This is not a bioinformatics question, strictly speaking.

You can check if there is only one field, and the length of the content in that field is 1:

if(len(values) == 1 and len(values[0]) == 1):
    next

(Code is untested and my keywords might be wrong, I don't do a lot of Python).

ADD REPLY • link 5.3 years ago by Ram 45k

0

Entering edit mode

sorry, i'm using these for transcription factor binding site analysis so i thought it could be counted as bioinformatics :)

didn't work but thanks anyway :)!

ADD REPLY • link 5.3 years ago by vschultz • 0

0

Entering edit mode

didn't work

Please give us more than "It did not work". What was expected and what actually happened?

I checked online and the keyword to use is continue, not next. I hope that helps.

ADD REPLY • link 5.3 years ago by Ram 45k

0

Entering edit mode

str.split() returns a list. That's why you get list.
use pandas to work with table in python.

ADD REPLY • link 5.3 years ago by shoujun.gu ▴ 350

0

Entering edit mode

Your points do not amount to an answer, they are one basic fact and one broad concept. I'm moving this post to a comment.

ADD REPLY • link 5.3 years ago by Ram 45k

0

Entering edit mode

i tried numpy but never used pandas before. thank you for advice! :)

ADD REPLY • link 5.3 years ago by vschultz • 0

score 0 · Answer 1 · 2020-04-09

0

Entering edit mode

5.3 years ago

jared.andrews07 ★ 19k

It's not completely clear what you're asking here, but I am going to high recommend using the Biopython motif module if possible, as it will make dealing with motifs in almost any format much, much easier.