Hi @gawbul
There seem to be some minor things to improve in your code. Eg: the done = 0
condition never gets changed so it is not needed. Also, in the case where no documentation is found, the function returns nothing. Here is another option:
EDIT: There where major flaws in the previous code. Here is a complete script that should do what you want. There is one major assumption, almost like the code suggested by Istvan. The DESCRIPTION part has to be followed by a line beginning with the equal sign (=
).
This is a pretty big assumption. The format itself seems to make it hard to find a generalization that would work on any Perl script written by a variety of people. The tricky bit is, as there is no markup to signify the end of the DOCUMENTATION section, it is hard to define exactly what constitutes its end. What if there is no more equal sign? What if there is no blank line before the beginning of the code? And what about the possible presence of multiple paragraphs?
If you intend to use it only for that particular file you need, or plan to manually change the format of all Perl files you include in your package so that they satisfy this assumption, then it should not be a problem. I would consider adding a =head1 DESCRIPTION_END
marker at the end of the documentation section, just to make it conform.
#!/usr/bin/python
import sys
import re
filename = sys.argv[1]
output = sys.argv[2]
def get_perl_info(filename):
"""Get lines containing '=head1 DESCRIPTION' in Perl scripts
"""
doc = []
begun = False
for l in (x.strip() for x in open(filename).readlines()):
if re.match("^\=head1\s+DESCRIPTION", l):
begun = True
elif begun == True and re.match("^\=", l):
return doc
elif begun == True:
doc.append(l)
return ["No documentation found in file: " + filename]
with open(output, "w") as f:
for l in get_perl_info(filename):
f.write(l + "\n")
You can run the script by first turning it into an executable (Linux):
chmod +x get_perl_info.py
and then
get_perl_info.py perl_code.pl extracted_doc.txt
NOTE: I also removed the second script since it could not extract the information correctly from the format that you specified.
Cheers
Thanks Neil. Yes, I would rather code completely in Python, but some things seem to be better implemented (strangely from a performance perspective) in Perl.
The scripts are stand-alone and simply drop into the plugin directory. They are called with arguments and then any output from them is parsed back to the toolkit.
I essentially want to make an easily expandable pipeline where I can run through an input configuration file that executes the plugin scripts one at a time, in order to integrate my research methodology, rather than having to run separate scripts left right and center.