I have a feeling there's a minor confusion with ID and name in your example - you can use the print function to see all attributes of a SeqRecord, here for a small test record:
for s in SeqIO.parse('./test.fasta', 'fasta'):
print(s)
prints
ID: Unchanged
Name: Unchanged
Description: Unchanged
Number of features: 0
Seq('ATGCTAGCTAGCTAGCTA', SingleLetterAlphabet())
Now if I change that:
for s in SeqIO.parse('./test.fasta', 'fasta'):
s.id = 'CHANGED'
print s
it prints
ID: CHANGED
Name: Unchanged
Description: Unchanged
Number of features: 0
Seq('ATGCTAGCTAGCTAGCTA', SingleLetterAlphabet())
So as you can see, name and description stay the same, which is probably what happens in your example. If I write this SeqRecord via SeqIO.write(s, 'out.fasta', 'fasta')
I get
>CHANGED Unchanged
ATGCTAGCTAGCTAGCTA
This should also answer your second question, using the normal print() function you can see all attributes. You can also use Python's in-built dir() method:
print dir(s)
prints
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__iter__', '__len__', '__module__', '__new__', '__nonzero__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_per_letter_annotations', '_seq', '_set_per_letter_annotations', '_set_seq', 'annotations', 'dbxrefs', 'description', 'features', 'format', 'id', 'letter_annotations', 'lower', 'name', 'reverse_complement', 'seq', 'upper']
Lots of Python standard functions, and a few BioPython specific methods.
The numbers in letter_annotations are ASCII numbers to accommodate for the various offsets, to get your phred+33 numbers back here's one example where the SeqRecord's quality is all '#':
>>> a.letter_annotations['phred_quality']
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Get the first number, and add 33 for the offset:
>>> print chr(a.letter_annotations['phred_quality'][0] + 33)
'#'
So you can for example do this:
>>> print [chr(x + 33) for x in a.letter_annotations['phred_quality']]
['#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#', '#']
or even
>>> print ''.join([chr(x + 33) for x in a.letter_annotations['phred_quality']])
'################################################################################'
Edit:
In case you're wondering, quality score is stored as ASCII numbers so that the user can directly specify the offset without much hassle (as above with the +33 offset). This is also possible:
print a.format('fastq-illumina')
- prints the # quality as B
print a.format('fastq-sanger')
- prints the # quality as #
print a.format('fastq-solexa')
- prints the # quality as >
Just, to clarify.. I can use dir(x.letter_annotations) but that does nto show me where the e.g. phred33 data is stored. Thanks.
Great, that solved my problem. Thanks.
BioStars is a Q&A site where the idea is that you mark good answers by accepting them (ticking them), this feeds into the rating system.
You can also comment on answers individually rather than adding another "answer" of your own.