Entering edit mode
7.9 years ago
rotem
▴
10
I'm wondering whether biopython or a similar package can read DNA sequences in the weird format that appears in US patents (which is required by the USPTO!). It is not hard to write a script that reads this format, but since python's Bio.SeqIO already processes so many different formats, it would be great if it could also deal with this one.
Does anyone know the format I'm referring to and whether any packages already deal with it? This format resembles DDBJ / EMBL, but not exactly. It looks like:
<211> some number
<212> some entity
<213> organism
<400> serial number
10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 60
10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 120
etc.
Thanks! Rotem
USPTO makes PatentIn available for sequence submissions. Seems to be Windows only.