Write List of subject ID numbers to a new text file(s)
1
0
Entering edit mode
7.1 years ago

I have a function that can read in the first line of multiple blastp files. I use a for loop to iterate through the files and print the subject ID numbers from the first line of each file. How could I write each list to its own file??

    import os, glob, sys

#This method reads in column index [1] from the protein result files and adds them to a list

def readProteinBlastFiles(name):
# Create path for files in my directory using the protein + database wildcard
path = '/Users/sueparks/' + name + 'L*'

Files_P = [] #empty list for subject accession numbers
for file in glob.glob(path):
    with open(file, 'r') as f:
        line = f.readline() # read first line of file
        if len(line) > 0:
            # field_0 = line.strip().split('\t')[0]
            field_1 = line.strip().split('\t')[1]

            #Files_P.append(field_0) # Query ID
            Files_P.append(field_1) # Subject ID
return Files_P

#Removed all P_17 files and removed all files from L_CTV-05 database
Bacillus_Proteins = ['P_1', 'P_2', 'P_3', 'P_4', 'P_5', 'P_6', 'P_7', 'P_8', 'P_9', 'P_10',
                 'P_11', 'P_12', 'P_13', 'P_14', 'P_15', 'P_16', 'P_18', 'P_19',
                 'P_20']

for prot in Bacillus_Proteins:
    list_SubjectAccession_Numbers = readProteinBlastFiles(prot)
    print prot
    print "############################"
    print list_SubjectAccession_Numbers

Results in console`

P_19 
['NP_391972.1', 'EEQ68114.1', 'NP_391972.1', 'EEQ25921.1', 'NP_391972.1', 'EFD99688.1', 'NP_391972.1', 'EFB61660.1', 'NP_391972.1', 'EEQ25318.1', 'NP_391972.1', 'EEJ40542.1', 'NP_391972.1', 'EEW51848.1', 'NP_391972.1', 'ADZ08087.1', 'NP_391972.1', 'EEJ68837.1', 'NP_391972.1', 'EFJ68832.1', 'NP_391972.1', 'EFH30349.1', 'NP_391972.1', 'EFO69387.1', 'NP_391972.1', 'EEU28530.1', 'NP_391972.1', 'EFQ45573.1', 'NP_391972.1', 'EGG32092.1', 'NP_391972.1', 'WP_013086961.1', 'NP_391972.1', 'EGC80044.1']

P_20
['EEQ68452.2', 'EEQ26185.1', 'EFD99988.1', 'EFB62008.1', 'EEQ24617.1', 'EEJ40165.1', 'EEW51876.1', 'ADZ06473.1', 'EEJ68783.1', 'EFJ69255.1', 'EFH29970.1', 'EFO68531.1', 'EEU28950.1', 'EFQ46733.1', 'EGG32019.1', 'WP_005720329.1', 'EGC80018.1']
python Subject ID number • 1.0k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
st.ph.n ★ 2.7k

From what I understand; changing this portion this will write each accession number belonging to each 'prot' in Bacillus_Proteins to it's on file as a list:

for prot in Bacillus_Proteins:
   with open(prot + '_results.txt', 'w') as out:
        list_SubjectAccession_Numbers = readProteinBlastFiles(prot)
        print prot
        print "############################"
        for i in list_SubjectAccession_Numbers:
                out.write(i)
ADD COMMENT

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6