I have 2 files which I need to parse and build a matrix out of them:
The files are as follows:
file 1
NC_000964.parsed
NC_002570.parsed
NC_003909.parsed
NC_003997.parsed
NC_004721.parsed
NC_005945.parsed
NC_005957.parsed
NC_006274.parsed
NC_006322.parsed
NC_006510.parsed
NC_006582.parsed
..
..
A file of my cleaned outputs from analysis. All in same directory (in this files are genes for certain species in combinations of blast outfile. i.e. in format (one line from file2 \t another line form file2)## if gene in file 2 aligned with gene in file1
file 2
gi|56418536|ref|YP_145854.1
gi|56418537|ref|YP_145855.1
gi|56418538|ref|YP_145856.1
gi|56418539|ref|YP_145857.1
gi|56418540|ref|YP_145858.1
gi|56418541|ref|YP_145859.1
..
..
A file of genes from some species in experiment. has more that 4000 genes.
I want to make a matrix in the sense that the 1st column is file 1 and the first row is file 2
Then I will open the files in one to compare with the list in file2. if matched, the coordinates in the matrix will fill with [1] else [0]. that will give me an absence presence matrix for my list in file2 against outputs in file1.
Urgent help needed since this makes a basis of my next move.
Thanks
NB..
my script so far
#!/usr/bin/env python
import os,sys,re
path = "./xxxxxx"
mylist= open('file1.txt','r')
mychecklist = open('file2.txt','r')
for line in mychecklist:#list of resistant genes
line=line.strip()
mybk.append(line) # array of file2
for line in mylist:# list of parsed files from blast output
line=line.strip()
listbk.append(line)# array if file1
for I in listbk:# open parsed files to read and analyze content
file = os.path.join(path,i)
files.append(file) have all files I
text= open(files ,'r')
for line in text:
### stuck...since all lines from files1 read to same file
Can you put your codes in the codeblock? So it will be easy to debug