Entering edit mode
3.0 years ago
Tejas Mahesh Kale
▴
10
I have a many blast output files of genome names, which looks like this.
In the first column of the file, it contains all the identified query UIDs, I want to make a presence-absence matrix in csv format in which a column would contain all the blast output filenames and row would contain UIDs. If the blast file contain any UID that should be marked as 1 and if it's not present it would be marked as 0. For less files this can be done manually, but for large number of files, I want a python script which can run through all the files and make a csv file like mentioned. Please help me in this.
What have you tried so far? Provide some code from which we can start.
And in addition to anything you've tried, provide here (or post the text at a code snippet-posting service such http.//gist.github.com and post the URL here) the text version of the lines in the picture you posted. Sharing a picture of many columns of a text file isn't really a productive way to share an example of input.