I have looked through all of the tagged Fasta posts, none seem to address this particular issue.
I have the following format for my headers:
>GeneDB|LinJ.35.4080 | organism=Leishmania_infantum | product=ATP-dependent RNA helicase, putative | location=LinJ.35:1596105-1598177(+) | length=690
I would like to learn how to retrieve the "LinJ.35.4080" (let's call this proteinID) as well as the "product=ATP-dependent RNA helicase, putative" (call this proteindescription) fields for all the records in my file and create a two column text file (proteinID, proteindescription).
If known, it would be helpful to remove the "product=" from the field as well.
I have searched through all of the online tutorials (that I could find) related to FASTA, Biopython, Python, etc., and thought (foolishly) my first attempt at this would go a bit smoother.
Thank you in advance for your help and time, this is something that will be used often. I'm hoping I didn't overlook someone else's solution to this.
Agree, but if you really want to use python, you can use .split -> http://docs.python.org/library/string.html#string.split