Question

Reverse Complement Of A Sequence Raises Assertionerror: Invalid Alphabet Found

0

Entering edit mode

12.7 years ago

Jelena_bioinf ▴ 40

I read a fasta-formatted genome sequence and try to get its reverse complement:

genomeSeq = FastaIO.FastaIterator(genomeHandle, IUPACUnambiguousDNA).next()
genomeSeq.seq.reverse_complement()

but it doesn't work and I can't understand why:

File "/Users/charodeika/Dropbox/genesGelfand/scripts/genome/src/matrixCount/sigma.py", line 127, in <module>
    print genomeSeq.seq[:10].reverse_complement()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py", line 804, in reverse_complement
    return self.complement()[::-1]
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py", line 752, in complement
    base = Alphabet._get_base_alphabet(self.alphabet)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Alphabet/__init__.py", line 213, in _get_base_alphabet
    "Invalid alphabet found, %s" % repr(a)
AssertionError: Invalid alphabet found, <class 'Bio.Alphabet.IUPAC.IUPACUnambiguousDNA'>

And, of course, I would like to find out how to make it work!

biopython • 3.8k views

ADD COMMENT • link 11.6 years ago by Jelena_bioinf ▴ 40

score 0 · Answer 1 · 2012-05-07

0

Entering edit mode

12.7 years ago

Niek De Klein ★ 2.6k

What I think the problem is, that genome sequences contain N's for nucleotides of which they are not sure what nucleotide it is. IUPACUnambiguousDNA only accepts ACTG. So you would want to change your script to ignore the N's (or any other non-dna letter there is in there.

ADD COMMENT • link 12.7 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

No, certainly not. It doesn't work for slices, that do not contain any other letters as A, T, G,C

ADD REPLY • link 12.7 years ago by Jelena_bioinf ▴ 40

1

Entering edit mode

Agreed - the alphabet letters are not relevant to this error.

ADD REPLY • link 12.7 years ago by Peter 6.0k

score 0 · Answer 2 · 2012-05-07

0

Entering edit mode

12.7 years ago

Jelena_bioinf ▴ 40

Well, when I added

from Bio.Alphabet import IUPAC

and changed

genomeSeq.seq.reverse_complement()

to

genomeSeq = FastaIO.FastaIterator(genomeHandle, IUPAC.unambiguous_dna).next()

it started working nicely. However, I would still be happy to learn why.

ADD COMMENT • link 12.7 years ago by Jelena_bioinf ▴ 40

1

Entering edit mode

Without seeing the complete code (the imports) I can't be 100% sure, but I believe your error is passing an alphabet class rather than an instance of the class (an alphabet object).

ADD REPLY • link 12.7 years ago by Peter 6.0k