Hi All,
Edited title of post to include "not perl," as this post was commented as a duplicate of another post which I found unhelpful.
Before I begin, I'd like to state that I am new to bioinformatics and may ask a stupid question. I have tried very hard to find a tool that would do this for me, but came up empty handed.
I am looking for either Linux/Ubuntu 22.04.3 compatible software, or the ability to create or run a Python3 script. I also have Python2, but that should not be relevant.
TLDR: In Python3 or Linux compatible software: Want to take an input list of multiple strings of nucleotides containing degenerate bases (primer sequences; the input list would likely be in .txt, tab-delimited, format) and output all possible nucleotide strings that are possible for that string containing one or more degenerate bases.
"|" Characters are used where a separation is needed but BioStars won't let me show it the way I want to in a text string. Assume "|" are the same thing as a tab delimiter
Input List Ex: Input list of primer sequences containing degenerate bases, degeneracy indicated by bold
PrimerFor-1A | ATGCATGCATGCATGCR
PrimerRev-1A | ATGCATGCATGCATGCY
PrimerFor-2A | ATGCATGCATGCATGRY
PrimerRev-2A | ATGCATGCATGCATBRY
Etc.
(Translation Table for Degenerate Bases B, R, and Y)
R = A or G
Y = C or T
B = C or G or T
Output Ex, bases which differ from the input are marked with bold and italics:
PrimerFor-1A-Output1 ATGCATGCATGCATGCA
PrimerFor-1A-Output2 ATGCATGCATGCATGCG
I have 22 primer sequences like the ones in the "example input" above, many of which contain 3 or 4 degenerate bases within them. Some of these degenerate bases have 3 available options for nucleotides that represent them. When trying to consider all of the possibilities for nucleotide sequences, I am afraid that I will make a mistake, miss, or forget some possibilities while trying to create this list by hand, so I want to guarantee that I don't make this mistake.
I am looking for any help in either pointing me towards existing linux/ubuntu software to do this, or a beginner-friendly method involving python3 scripting or biopython. I have python3/biopython fully configured if there happens to be a script that already does this.
Any help would be greatly appreciated!
Best,
-M
duplicate of Degenerate Nucleotide Sequences
Hi, those solutions all use perl. I am asking for a solution in Python3 or linux based software.
wrong, my answer was C and will compile on your "supercomputer cluster"
Is perl not Ubuntu 22.04.3 compliant? Is there a specific reason you're avoiding perl solutions?
Perl scripting does not work on our lab's supercomputer cluster, unfortunately the explanation was too over-my-head to understand. They basically said I could use anything else, and I know python3 and biopython works.
Sorry for the lack of info. -Matt
You will need to tell us if this works: https://www.researchgate.net/publication/354451685_Generate_all_the_possible_combinations_of_a_Degenerate_DNARNA_sequence_in_FASTA_format_using_Python
I don't recall what the Biostars policy is on LLM recommendations, but this sort of situation seems like something that ChatGPT would be good at solving (with some test cases to verify the correctness of the solution).
Recommending LLM-created solutions is fine as long as one explicitly states that the code was generated using whatever LLM tool.
Good to know, thanks!