Hi Everyone, I am a Biologist {NOT A PROGRAMMER} and trying to syntax my own code to find differences between my data files.
File1.txt: Orange, orange, apple, pear
File2.txt: pear, Pear, Kiwi
Output.txt: -Orange -Orange -apple -pear +pear +Pear +Kiwi
In this case lowercase "pear" is the only common fruit between my files and thus the output shows both +pear and -pear. But this is not extremely helpful because I want to use this code for really long gene lists. Is there some way to further filter the common fruit and display them for example without a "+" or "-" the output.txt. As this is not very helpful to have to go through what has + and - in a very big list full of duplicates.
this is my code:
> import difflib
>
> with open('/Users/.../file1.txt') as file_1:file_1_text = file_1.readlines()
> with open('/Users/.../file2.txt') as file_2:file_2_text = file_2.readlines()
>
> mfile = open('output.txt', 'w')
>
> for line in difflib.unified_diff(file_1_text, file_2_text,fromfile='file1.txt',tofile='/Users/.../file2.txt',
> lineterm=''):
> mfile.write("%s\n" % line)
> print(line)
What you are trying to perform is commonly known as "Set operations" in programming. So this keyword should help you to google what you need - at the first glance, this tutorial seems quite appropriate.
If using
python
is not a requirement, an easy approach to find the common genes between two files could be first to convert your files and replace the commas by new lines:And then find the common elements between these two new files using
grep
:It is not that I have to use python but is is preferable because when I fix it people in my lab will use it too !
I guess people in your lab could use
bash
just as they would usepython
? I personally findbash
andawk
faster and simpler when it comes to straightforward file parsing problems, as the current issue of finding common elements between two files.Well, if your goal is not to learn Python, but to provide your lab with an easy way to intersect gene lists, then I would recommend a browser-based GUI approach.
Galaxy has a rudimentary intersection feature, but much nicer is Intervene (documentation), which can also create beautiful figures.