Try this so you'll see that using tabulate to print the dataframe causes it to respect the line breaks separating the strings so that you can view it from Python and you'll also find Excel viewing the CSV file saved from the dataframe respects the line breaks in that entry, too:
%pip install primer3-py==0.6.1
%pip install tabulate
import primer3
t_str_noms = ["test1","test2","test3","test4"]
d = {}
for t in t_str_noms:
item_list = [i.split('\t')[1] for i in primer3.bindings.calcHeterodimer('TGACACCGCCAAGGTGAATTT', 'CCGCTCCGTGGTTGGTCCGGTGGCGAGCGG', output_structure = True).ascii_structure_lines]
d[t] ='\n'.join(item_list)
import pandas as pd
df = pd.DataFrame.from_dict(d, orient='index')
#print(df.to_string())
from tabulate import tabulate
print(tabulate(df)) # use based on https://stackoverflow.com/a/49739927/8508004 and https://github.com/astanin/python-tabulate#multiline-cells
df.to_csv("test.csv")
Gives as text from printing in Python, the following:
----- --------------------------------
test1 TGA ------ A T AATTT
CACCG CCA GG G
GTGGC GGT CC C
GGCGAGCG CTGGTT G T GCC--
test2 TGA ------ A T AATTT
CACCG CCA GG G
GTGGC GGT CC C
GGCGAGCG CTGGTT G T GCC--
test3 TGA ------ A T AATTT
CACCG CCA GG G
GTGGC GGT CC C
GGCGAGCG CTGGTT G T GCC--
test4 TGA ------ A T AATTT
CACCG CCA GG G
GTGGC GGT CC C
GGCGAGCG CTGGTT G T GCC--
----- --------------------------------
And when that CSV file is opened in Excel and the font set to the Courier font and the second column border dragged to the right a little, it looks like:
What is going on behind the scenes here in case you wanted to control it a more directly?
Examining the .csv
file produced from the Python code in your favorite text editor can help gain insight:
Looking at the csv file produced in you get a sense that double quotes around the string are important.
As shown in here the trick is getting the string with line breaks saved in the `.CSV file with double quotes surrounding it, which came from a link in an answer to 'Python: Add Line breaks into Excel cells while exporting the DataFrame'.
A purely Pandas way using pandas.DataFrame.explode
:
%pip install primer3-py==0.6.1
import primer3
t_str_noms = ["test1","test2","test3","test4"]
d = {}
for t in t_str_noms:
d[t] = [i.split('\t')[1] for i in primer3.bindings.calcHeterodimer('TGACACCGCCAAGGTGAATTT', 'CCGCTCCGTGGTTGGTCCGGTGGCGAGCGG', output_structure = True).ascii_structure_lines]
import pandas as pd
df = pd.concat(map(pd.Series, [d]), axis=1)
df = df.explode(0) # based on https://stackoverflow.com/a/66732712/8508004
print(df.to_string())
df.to_csv("test_from_tabulate.csv")
That gives:
0
test1 TGA ------ A T AATTT
test1 CACCG CCA GG G
test1 GTGGC GGT CC C
test1 GGCGAGCG CTGGTT G T GCC--
test2 TGA ------ A T AATTT
test2 CACCG CCA GG G
test2 GTGGC GGT CC C
test2 GGCGAGCG CTGGTT G T GCC--
test3 TGA ------ A T AATTT
test3 CACCG CCA GG G
test3 GTGGC GGT CC C
test3 GGCGAGCG CTGGTT G T GCC--
test4 TGA ------ A T AATTT
test4 CACCG CCA GG G
test4 GTGGC GGT CC C
test4 GGCGAGCG CTGGTT G T GCC--
And when that CSV file is opened in Excel and the font set to Courier font, it looks like:
The pure Pandas one is based on what you had originally since it was making the dataframe actually have a Python list in a single column, which Pandas normally doesn't want to do when you use typical Pandas routes to making a dataframe from a dictionary, i.e., df = pd.DataFrame.from_dict(d, orient='index',)
. In the typical way, Pandas will normally put each list item in a separate column itself.
If anyone wants to test these code blocks, they'll work right in your browser without needing to install anything on your computer or login/resister by using remote temporary machines served via the MyBinder.or service:
Try running either code block in the Jupyter notebook file (or create a new one) that comes up in the temporary session after clicking on the 'launch binder
' badge here.
NOTE: Anyone using this code with the current versions of primer3-py
will use primer3.bindings.calc_heterodimer(
in place of primer3.bindings.calcHeterodimer(
. This is because following version 0.6.1 the developers deprecated the use of camel case for the function names in primer-py
, as noted at the bottom of this section of the documentation here.
You should include all your code to produce your toy example so that others can pick up with what you have worked. And then more likely you'll get back something closer to what you had and you'll understand it better.
Specifically, I cannot get what you show you are getting to even begin to be able to show you how you can get what I thought you may be asking for.
Here's what I cobbled together based on your post:
(Ignore the first line if you aren't using Jupyter. I'm running that code in the temporary session that comes up after clicking on the '
launch binder
' badge here.)(By the way, Python the code block above was just my guess at how you may have gone from your earlier code block to the block where
df = pd.concat(map(pd.Series, [d]), axis=1)
. I'm in no way endorsing what I cobbled together as a viable way to do those steps. I'm only attempting to reverse engineer what you may have done.)What I get is close to yours but not quite:
That brings me to the main issue...
What are you actually expecting from your question?
Is "this format " what you show just above or what was more above? And they seem to be have the same elements in them. What exactly is different that you want?
Minor: You are using and posting deprecated code in the way you are using
primer3.bindings.calcHeterodimer()
. Please look into things when you see such warnings and use updated syntax so that you have an easier time down the line. Eventually, in later versions of the module, the deprecated versions will cease to work and there won't be a warning. Specifically, in this case the use of camel case for the name of the functions has been determined to be a convention they are ceasing to support in future versions, see here. here, and "NOTE. camelCase methods are deprecated" noted at the bottom of this section of the documentation here.Hi Wayne,
Thanks for the response. I have updated the code to be more representative of the toy example I have provided - apologies for any confusion.
In practice I will have a substantial list of primers to be used in a multiplex where I want to ensure no heterodimers will form.
The format I am referring to is in the first output shown. Here it is again:
As you stated, all the elements are retained in the dataframe. However, this representation is ideal for the less 'coding-minded' folks who I intend on sharing this information which leads to the desire to keep the format I am showing above. For clarity, the format I desire if possible is the 'alignment' of the hybridized nucleotides as seen above. I've tried methods such as the example below which did not work.
So, more clearly, I am trying to retain the formatting of the base alignment for the predicted dimers when the list is integrated into a dataframe and subsequently exported to a csv.
Finally, I am aware that I am using deprecated bindings. The broader group I am working with on other elements of the project are still on the previous version so we will have to update all of our code away from the camelCase methods in the future. Thanks for recognizing this.
Thanks again!
Starting to understand what you want and why.
My question is now also then do you actually want the brackets? They seem to shift the relative next lines. Shouldn't it be something like:
That way you can see the corresponding lines better?
How do you want the name, like 'test1' stuff, handled in this or no names? I'm trying to think about scaling since you say, " a substantial list of primers" More importantly, it sounds like you want this retained in a CSV file as text in the end? Is that how it will go to the less coding-minding folks?
I'll try to even specify the version then since you are stuck with that.
Here's some of that based on what I had built on what you had originally since it was making the dataframe actually have a Python list in one column which it normally doesn't want to do, i.e., instead of typical
df = pd.DataFrame.from_dict(d, orient='index',)
:That gives:
Glad I am making this more clear. The 'test1' in practice will be the 'primer name' to allow for tracking of other metrics (Tm, hairpin structure, etc.) and will be used as the index of the dataframe to aggregate all the data. The primer names and sequences are stored in a dictionary that I iterate through and perform all the possible heterodimer calculations for each combination. This dataframe will then be exported as a csv and serve as a report that can be shared with others.
Yes elimination of the brackets is completely fine and indeed do perturb the alignment!
I am using primer3-py version 0.6.1.