Processing DNA sequences in format as appears in US patents

0

Entering edit mode

8.6 years ago

rotem ▴ 10

I'm wondering whether biopython or a similar package can read DNA sequences in the weird format that appears in US patents (which is required by the USPTO!). It is not hard to write a script that reads this format, but since python's Bio.SeqIO already processes so many different formats, it would be great if it could also deal with this one.

Does anyone know the format I'm referring to and whether any packages already deal with it? This format resembles DDBJ / EMBL, but not exactly. It looks like:

<211> some number

<212> some entity

<213> organism

<400> serial number

10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block   60

10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block   120

etc.

Thanks! Rotem

USPTO dna sequence format biopython • 1.4k views

ADD COMMENT • link updated 8.6 years ago by GenoMax 152k • written 8.6 years ago by rotem ▴ 10

0

Entering edit mode

USPTO makes PatentIn available for sequence submissions. Seems to be Windows only.

ADD REPLY • link 8.6 years ago by GenoMax 152k

Login before adding your answer.