If I'm using the FASTA files from the link below, what Alphabet type should I use in Biopython? Would it be IUPAC.unambiguous_dna?
link to FASTA files: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A
If I'm using the FASTA files from the link below, what Alphabet type should I use in Biopython? Would it be IUPAC.unambiguous_dna?
link to FASTA files: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A
This is a duplicate of your question to the Biopython mailing list: http://lists.open-bio.org/pipermail/biopython/2013-March/008415.html
My answer is here: http://lists.open-bio.org/pipermail/biopython/2013-March/008416.html
Essentially I would use genericdna rather than unambiguousdna for now. The IUPAC alphabet object has a white list of expected letters, but current versions of Biopython do not enforce this in the sequence objects. That may change.
from Bio.Alphabet import generic_dna
from Bio.Alphabet.IUPAC import unambiguous_dna
You don't actually need to specify an alphabet at all, but telling Biopython the sequence is DNA will prevent some user errors.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
And as of Biopython 1.78, you can't specify the alphabet - Bio.Alphabet was removed.