Question

Python3 or Ubuntu, not perl: Have Primers with Degenerate Bases, Need tool or way to List all Possible Nucleotide Sequences

0

Entering edit mode

8 months ago

Matthew • 0

Hi All,

Edited title of post to include "not perl," as this post was commented as a duplicate of another post which I found unhelpful.

Before I begin, I'd like to state that I am new to bioinformatics and may ask a stupid question. I have tried very hard to find a tool that would do this for me, but came up empty handed.

I am looking for either Linux/Ubuntu 22.04.3 compatible software, or the ability to create or run a Python3 script. I also have Python2, but that should not be relevant.

TLDR: In Python3 or Linux compatible software: Want to take an input list of multiple strings of nucleotides containing degenerate bases (primer sequences; the input list would likely be in .txt, tab-delimited, format) and output all possible nucleotide strings that are possible for that string containing one or more degenerate bases.

"|" Characters are used where a separation is needed but BioStars won't let me show it the way I want to in a text string. Assume "|" are the same thing as a tab delimiter

Input List Ex: Input list of primer sequences containing degenerate bases, degeneracy indicated by bold

PrimerFor-1A | ATGCATGCATGCATGCR

PrimerRev-1A | ATGCATGCATGCATGCY

PrimerFor-2A | ATGCATGCATGCATGRY

PrimerRev-2A | ATGCATGCATGCATBRY

Etc.

(Translation Table for Degenerate Bases B, R, and Y)

R = A or G

Y = C or T

B = C or G or T

Output Ex, bases which differ from the input are marked with bold and italics:

PrimerFor-1A-Output1 ATGCATGCATGCATGCA

PrimerFor-1A-Output2 ATGCATGCATGCATGCG

I have 22 primer sequences like the ones in the "example input" above, many of which contain 3 or 4 degenerate bases within them. Some of these degenerate bases have 3 available options for nucleotides that represent them. When trying to consider all of the possibilities for nucleotide sequences, I am afraid that I will make a mistake, miss, or forget some possibilities while trying to create this list by hand, so I want to guarantee that I don't make this mistake.

I am looking for any help in either pointing me towards existing linux/ubuntu software to do this, or a beginner-friendly method involving python3 scripting or biopython. I have python3/biopython fully configured if there happens to be a script that already does this.

Any help would be greatly appreciated!

Best,
-M

degenerate-bases python primers biopython • 1.2k views

ADD COMMENT • link updated 8 months ago by Dave Carlson ★ 2.0k • written 8 months ago by Matthew • 0

0

Entering edit mode

duplicate of Degenerate Nucleotide Sequences

ADD REPLY • link 8 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi, those solutions all use perl. I am asking for a solution in Python3 or linux based software.

ADD REPLY • link 8 months ago by Matthew • 0

1

Entering edit mode

Hi, those solutions all use perl.

wrong, my answer was C and will compile on your "supercomputer cluster"

ADD REPLY • link 8 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Is perl not Ubuntu 22.04.3 compliant? Is there a specific reason you're avoiding perl solutions?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

Perl scripting does not work on our lab's supercomputer cluster, unfortunately the explanation was too over-my-head to understand. They basically said I could use anything else, and I know python3 and biopython works.

Sorry for the lack of info. -Matt

ADD REPLY • link 8 months ago by Matthew • 0

0

Entering edit mode

You will need to tell us if this works: https://www.researchgate.net/publication/354451685_Generate_all_the_possible_combinations_of_a_Degenerate_DNARNA_sequence_in_FASTA_format_using_Python

ADD REPLY • link 8 months ago by GenoMax 147k

0

Entering edit mode

I don't recall what the Biostars policy is on LLM recommendations, but this sort of situation seems like something that ChatGPT would be good at solving (with some test cases to verify the correctness of the solution).

ADD REPLY • link 8 months ago by Dave Carlson ★ 2.0k

0

Entering edit mode

Recommending LLM-created solutions is fine as long as one explicitly states that the code was generated using whatever LLM tool.