I have generated a BED like tab delimited file with intersections with the Gencode v.19 annotated gtf and splice junction files from the Human BodyMap 2.0. I am in the process of completing a script that will allow me to create weighted venn diagrams of this dataset. My problem is that I have 1D arrays and I need to convert them to 3D arrays. I am quite new to this array and plotting business. I have pasted my code below, please advice me on how to proceed. I have also included the traceback call.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib_venn import venn3, venn3_circles
import os
import sys
input_file = "/home/ruchik/bodyMap_Data/bodyMap_Files/sample.txt"
print len(sys.argv)
if len(sys.argv) < 2:
file_name = raw_input('Enter file path: ')
else:
print sys.argv[1]
col1_idx = [7]
col2_idx = [8]
print >> sys.stderr,'File is {file} and used columns are {col1} and {col2}.'.format (file=input_file, col1=col1_idx, col2=col2_idx)
## Open the file and read each line
## The 1st line: (col 7th to ith) is stored into a array or list named 'header_all'
## The other lines: 7th (aka col0_idx) to ith column into an array named 'data_all'
f = open("/home/ruchik/bodyMap_Data/bodyMap_Files/sample.txt")
fh = open(input_file, "r")
header = fh.readlines()
data = fh.readlines()
print "Header: "
#print header
print "Data: "
for j in range(len(data)):
data[j] = data[j].split("\t")
for i in range(len(data[j])):
data[j][i] = int(data[j][i])
fh.close()
data = np.array(data, dtype=np.int32)
d = np.reshape([data[:, 0], data[:, 1], np.reshape(np.all(data[:, 2:5], 1))], dtype=np.int32).transpose()
op = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], [0, 1, 1], [1, 1, 1]]
count = [0, ] * len(op)
for i in range(len(op)):
for j in range(len(d)):
if list(d[j]) == op[i]:
count[i] += 1
for k in range(len(op)):
print str(op[k]) + ': ' + str(count[k])
## Pipe the data and create the venn diagrams
plt.figure(figsize=(4,4))
v = venn3(subsets=count, set_labels = ('Introns', 'Heart', 'Kidney'))
v.get_patch_by_id('100').set_alpha(1.0)
v.get_patch_by_id('100').set_color('white')
v.get_label_by_id('100').set_text('Unknown')
v.get_label_by_id('A').set_text('Set "A"')
plt.show()
Traceback:
Traceback (most recent call last):
File "make_venn.py", line 39, in <module>
d = np.reshape([data[:, 0], data[:, 1], np.reshape(np.all(data[:, 2:5], 1))], dtype=np.int32).transpose()
IndexError: too many indices
Thank you for your time!
EDIT
Input File:
chrom start end name score strand intron Heart Kidney Liver Lung
chr10 100008748 100010821 . . - 1 1 1 1 1
chr10 100010933 100011322 . . - 1 0 1 0 1
chr10 100011459 100012109 . . - 1 1 1 1 1
chr10 100011959 100015344 . . + 1 0 0 0 0
chr10 100012225 100013309 . . - 1 0 1 0 1
chr10 100013553 100015333 . . - 1 0 1 1 1
chr10 100015496 100016536 . . - 1 1 1 1 1
chr10 100016704 100017406 . . - 1 0 1 0 1
chr10 100017561 100017737 . . - 1 0 1 1 1
This file was generated by intersecting the Gencode V.19 GTF file with the splice junctions of each tissue. The goal is to create a script which will take this file as input, keep Col. 7 constant take another tissue and the union of other three tissues to create a three way venn diagram since the python library can only take three files as input unlike R where multiple inputs can be used. This file has close to 420k records which is why I thought that arrays would come useful but now that I think about it, I can use dictionaries for this purpose.
The Venn library I am using is Matplotlib-venn.
I read as far as "file_name = raw_input('Enter file path: ')" and got bored.
Can't see what you mean by converting a 1-D array to a 3d array. A BED file is one-dimensional, you could make it 2d by flagging intersections (rows are BED, columns are GTF), but I don't understand where the 3d array is expected.
Please explain what you're trying to do, and we can suggest a solution.
I'm thinking OP means a 2D array with 3 elements each. Not sure if OP is savvy with array basics.
I added the example of my input file and a brief description of what I am trying to achieve.
Ruchik, this info is purely FYI.
1D array looks like this:
[1,2,3,4,5,6]
2D array looks like a matrix:
3D array is a list of matrices. Imagine 2 layers of the 2D matrix shown above. Another example here: http://www.mathworks.com/help/matlab/math/ch_data_struct4.gif
Hello Ram,
I am a wet lab person transitioning to Bioinformatics and so I don't have a full understanding of Arrays. I also thought about my problem a little bit and it looks like I don't need to have arrays to do my job. I will update on what I am trying to achieve when I get to the lab and so you can have a better understanding of what I am trying to do.
I understand, Ruchik. That's why I gave you some basic info on arrays :-) Welcome to the wonderful world of Bioinformatics!
Also, just noticed a missing
]
here:It should be so for proper syntax:
I'm assuming you're trying to slice the first two columns out and reshape it. Also, shouldn't you be working with the
data_all
array in the code above?Hello Ram,
I am a wet lab person transitioning to Bioinformatics and so I don't have a full understanding of Arrays. I also thought about my problem a little bit and it looks like I don't need to have arrays to do my job. I will update on what I am trying to achieve when I get to the lab and so you can have a better understanding of what I am trying to do.
In the future, please make an effort to format the text in your question. This makes it easier to distinguish question text from source code or input. This makes it easier for people to help you.