| Author: | Jacob Smullyan |
|---|---|
| Version: | 1.7 |
Welcome to the first article of Approximating Py, a column devoted to exploring different corners of the the Python development universe, which is currently undergoing an expansion explosive enough to make any Big Bang theorist proud. I'm starting out in this article and its sequel by discussing a project that I know very well, since I have been active in developing it for two years: the SkunkWeb Web Application Server 1.
A remarkable number of Python web application servers and web development frameworks have sprung up in the last couple of years 2. This healthy development attests to, among other things, the ease with which it is possible to write a complex application in Python, a language in which reinventing the wheel is sometimes no harder than refitting one. To some extent this may also be interpreted as a reaction against the long-standing behemoth in the Python web application server world, Zope, which, while a remarkable achievement in its own right, requires of developers who would use it an adjustment of mindset that not all developers wish to make; the specialization which makes Zope easy to use for some applications -- the thoroughly realized metaphor of object publishing, acquisition, web interfaces and the built-in system of permissions -- has the effect of making it hard, or at least awkward or idiosyncratic, to use for others. In several cases, it seems that developers felt that it would be simpler to write a custom-made application server that met their needs exactly than force themselves and their projects to fit in the development paradigms championed by a pre-existing framework.
SkunkWeb is one such Python framework. It was developed principally by Drew Csillag in the Skunk Platform Engineering Group at Starmedia Network in New York in 1998 as an alternative to expensive costly commercial solutions, and went through several development cycles there before a new refactored version, the 3.x series, was publically released under the GPL in August 2001; as the bugs in that series have been pretty thoroughly worked out at the time of this writing (October 2002), SkunkWeb is mature, featureful, and stable.
SkunkWeb needed from the first to deal well with the massive loads of an extremely high-traffic portal site, and needed to scale on multi-processor hardware; therefore, it was determined that a multithreaded Python application was not an option, since Python's global interpreter lock prevents native Python processes from effectively utilizing multiple cpus. SkunkWeb is a multiprocess preforking daemon with a fixed number of persistent child processes that service requests. In a typical production setup, it would work behind Apache, and a small number of SkunkWeb processes can service requests from a larger number of Apache processes that handle networking with the client; by using a fixed-sized pool, one avoids over-frequent disconnections and reconnections from databases and the cost of creating and initializing python processes when the pool (re-) grows. In this regard, SkunkWeb is different than most other Python web frameworks, which tend to be multithreaded.
The SkunkWeb philosophy is: avoid contention for resources; keep it simple, fast and flexible; and don't restrict the programmer. SkunkWeb is very easy to use, but if you want to do something very sophisticated with it, it doesn't get in your way, just like Python itself.
In order to install SkunkWeb, you need a unix system (not Windows, which doesn't support the fork() call; Linux, FreeBSD, Solaris, Mac OS 10.2 are all confirmed to work), Python 2.1.1 or later (some recent non-essential code uses 2.2 constructs), and the Egenix mx-base extension package 3 (specifically for mx.DateTime). While for a production site you'd want to use Apache or another web server to handle HTTP communication with clients, for testing that isn't necessary; SkunkWeb has a built-in webserver that works well enough, and it is enabled by default.
To install SkunkWeb in its most basic configuration, download and untar the latest SkunkWeb tarball, cd into the top directory, and configure and build it as so:
./configure --without-mod_skunkweb --prefix=/home/yourname/skunk make && make install /home/yourname/skunk/bin/swmgr start
If you now open http://localhost:8080/ with your web browser, you should see the SkunkWeb start page. There will be links to demo applications; one of them, the image gallery demo, requires the Python Imaging Library 4 to work, and the other requires that your machine have direct internet access; if you don't have those prerequisites, you'll have the chance to get familiar with SkunkWeb's default server error page.
If you look inside the document root of a SkunkWeb installation (for a new unconfigured installation, the docroot directory underneath the directory in which you installed SkunkWeb) it is likely to contain, in addition to html and other familiar-looking files, others whose names end with other various exotic extensions. Some of these documents are url-accessible, and some aren't, and only some of the former category will be considered by SkunkWeb to be executable SkunkWeb components, that is, files containing code that produce dynamic content; the rest will be treated as static data to be returned to the client (e.g., image files).
By default, top-level interpreted (that is, executable) documents end in the extensions:
- .html -- a top-level STML (Skunk Template Markup Language) document
- .py -- a top-level Python script
The following document types are interpreted, but are not url-accessible, and you will get a 404 error if you try to access them through the browser; they are only accessible, and executed, when called from other documents or code:
- .inc -- an STML include
- .pyinc -- a Python include
- .comp -- an STML regular component
- .pycomp -- a Python regular component
- .dcmp -- an STML data component
- .pydcmp -- a Python data component
- .msg -- a simple message catalog
- .cat -- a multilingual message catalog
More about STML and those includes and components and what they do in a moment (I won't be covering message catalogs, a tool for internationalized messages, here, but they are covered in the STML reference. Suffice it to say for now that they are little text-based databases containing strings that can be accessed by STML tags and a Python API). Any other documents are, by default, treated as static content, and are url-accessible. (But that is only the default behavior, and SkunkWeb, being a flexible beast, can be configured to change it in several ways.)
Take notice of two points. First, virtually everything that can be done in SkunkWeb with STML can be done in Python as well. Second, a distinction is drawn between url-accessible, or "top-level" documents, and the various types of subsidiary components that can be referenced from them. The main difference between them is that top-level executable documents are executed in a Python namespace with an object in it called CONNECTION, which serves the same roles typically played by Request and Response objects in frameworks like Java Servlets, Zope, ASP, etc; subsidiary documents may not have that object in their namespace, for reasons which will become clear.
It is perfectly possible to write SkunkWeb applications without using STML at all, but STML is very convenient if you exist in the real world and have to cope with the ugliness of html markup. It is designed for cases when some degree of mixture of presentation logic and markup (usually html, but not necessarily) is inevitable. By default, SkunkWeb interprets all files of type text/html as stml files. The definitive reference for STML is the STML Reference Guide 5, which covers the material in this section in much more detail.
An STML tag is delimited with the tokens <: and :>. The first word between them is the tag name; subsequent text within the tag is a list of tag attributes:
<:tagname [attr1=value1 attr2=value2 ...]:>
Very often you can leave out the attribute name, since the tag expects them in a particular order (much as you can with the arguments of a Python method). Any arbitary python expression (even a multi-line expression) can occur as an attribute value, by placing it within backticks. For instance, <:val:> is a tag to display the string value of a Python expression; to display the result of the Python expression 2 - 2, you would write <:val expr=`2 - 2`:> or simply <:val `2 - 2`:>. <:set:> sets a value, and <:call:> calls an arbitrary Python expression:
<:call `x=3`:>
is therefore equivalent to
<:set x `3`:>
You can import arbitrary Python modules into the template, without restriction, using the <:import:> tag. The following two STML statements are equivalent:
<:import PIL.Image as="P":>
and
<:call `import PIL.Image as P`:>
(In fact, because the component system of SkunkWeb encourages one to split up applications into specialized and reusable components, one seldom uses the <:call:>` tag. Normally I at least prefer to do the programmatic heavy lifting in Python components, and leave STML for cases when objects are perhaps sifted through in order to present them, but not generally being created or heavily modified. Nonetheless, having this escape hatch is very convenient and sometimes necessary.)
STML also has block tags, of the form:
<:tagname [attrs]:> content <:/tagname:>
One useful block tag is <:spool:>, which collects the output of a particular block of STML within a template as a string and returns it as a variable rather than printing it to the response output. For instance:
<:set x "O mysterious Frankfurter!":> <:spool mystring:>O mysterious Rose! <:val `x`:><:/spool:>
causes the variable mystring to be assigned the value:
"O mysterious Rose! O mysterious Frankfurter!"
The <:#:><:/#:> (or equivalently, <:comment:><:/comment:>) block tags are how you typically add comments to STML templates; they aren't included in the HTML output. (The contents of the comment tag must be well-formed STML; if you want to comment out a section that cannot be parsed as an STML block, you can use the special <:* ... *:> tag.)
STML provides the same control structures as Python itself: for, while, if, else, elif, break, continue, raise, try, except, and finally.
<:import sys:>
<:import mymodule interesting_number:>
<:set x `1`:>
<:while `x`:>
<:if `x==1`:>
<:break:>
<:/if:>
This will never appear.
<:/while:>
<:try:>
<:raise `ValueError`:>
<:except `ValueError`:>
<h3>Foolish exception raised and caught.</h3>
<:/try:>
<:for `xrange(sys.maxint)` i:>
<:if `interesting_number(i)`:>
An interesting number: <:val `i`:><br />
<:break:>
<:elif `i < 100`:>
<:continue:>
<:else:>
Not an interesting number, but big
enough to mention: <:val `i`:><br />
<:/if:>
<:/for:>
It also has a <:halt:> tag, which stops the execution of the currently executing template:
<:if `animal=='zebra'`:> <:halt:> <:/if:> <:call `drive_on()`:>
All the tags mentioned heretofore are usable in any STML file, whether a top-level document or a component. The <:args:> tag, however, can only be used if the namespace in which it is executed contains a CONNECTION object; this means, in most cases, a top-level document. <:args:> extracts arguments from the CONNECTION, converts them to Python values as you specify, and then places them into the local namespace. You can convert the values from the strings in the CGI submission with conversion functions, and supply default values. Say you request the following document with the query string moniker=Sylvester&animal=hedgehog&age=70:
<:args moniker animal="rhinoceros" weight=`(int, 4800)` age=`int`:> Your beloved <:val `animal`:> <:val `moniker`:> weighs <:val `weight / 20`:> stone <:if `age is not None`:> and is <:val `age*12`:> months old. <:else:> and is reputed ageless. <:/if:>
The output would be:
Your beloved hedgehog Sylvester weights 24 stone and is 840 months old.
The STML Reference explains in some detail how the <:args:> tag is used. In brief, the arguments to the tag should be a list of argument names you expect may be submitted to the page; if you state an argument value, it can either be:
- if callable, a conversion function that will be used to marshal the value;
- if a tuple of length two, a pair (conversion_function, default_value); or
- a default value.
If you don't state a default value, the default default value is None; if you specify a conversion function and the conversion raises an exception, the exception is silently caught and the default value is returned. In Python, an equivalent function is CONNECTION.extract_args().
The templating essentials covered in the previous section are all very nice, but having lots of tags in html pages quickly leads to spaghetti code and mires the application in the logic of its presentation. The heart of SkunkWeb is its solution to this problem: the component. SkunkWeb components come in several flavors, as mentioned: includes, regular components, and data components. All can be written in Python or in STML, and other templating solutions are also possible: a PSP implementation exists, and Cheetah templates can be used, with some restrictions. All these components are compiled to Python bytecode, and their compiled representations are stored in a configurably located directory called the compileCacheRoot (and, by default but depending on SkunkWeb's configuration, in memory).
As stated earlier, top-level documents are url-accessible resources like .html files or python files with the extension .py in the docroot. Such a document is interpreted by SkunkWeb in a namespace containing a useful object called CONNECTION, and the standard output of its execution is gathered and written as the body of the http response.
CONNECTION is an object that represents both the HTTP request and HTTP response, and is an instance of web.protocol.HttpConnection. Its most important attributes are:
The simplest kind of component in the include. Includes are executed in the namespace of the component that calls them; therefore, if they are called from a top-level document, they have access to all its variables, including CONNECTION. Includes are mostly useful when you want to share code that needs access to the CONNECTION between pages.
This is how you include an include component from STML:
<:include fname.inc:>
The include tag takes no other arguments. Let this be my_include.inc:
<:args animal='elephant' food='coconut':> My <:val `animal`:> will now eat <:val `food`:>.
and let the following index.html be requested with no cgi arguments:
Hello. <:include my_include.inc:> Goodbye, <:val `animal`:>.
The output would be:
Hello. My elephant will now eat coconut. Goodbye, elephant.
Note that the include, by using the <:args:> tag, has altered the namespace of the component that called it.
To include a component from Python:
import AE # the package with the Component implementation
AE.Component.callComponent('/path/to/include.inc',
compType=AE.Component.DT_INCLUDE)
Regular components also generate output that is appended to the http response body, but unlike includes, are executed in their own namespace. You can pass name/value pairs into their namespace by passing in arguments in the component call. Most importantly, if you so choose, the output of components can be cached, so that subsequent calls to a component with a given set of arguments will not have to execute the component all over again. The caching can be controlled with a considerable degree of finesse.
Since a regular component is executed in its own namespace, it does not alter the namespace of the calling component, and does not have access to anything in the calling component's namespace that is not explicitly passed in. This is essential for component caching to work. The output of cached components is stored under a cache key that is made from the component path and a digest of the component's arguments; the assumption is made that cached components are either idempotent or that you want them to behave as if they were during the cache interval specified. By separating the namespace of regular components from their callers, you can achieve that idempotence cleanly.
To see caching in action, let us deliberately create a component that is not idempotent, and that prints the time at which the component is executed. Let the following be a component called /comp/welcome.comp:
<:compargs name message="Welcome, %(name)s":>
<:cache duration="5h":>
<:val `message % {'name' : name}`:>
<:import time:>
<:val `time.strftime('%c', time.localtime())`:>
The <:compargs:> tag is a way of enforcing a component signature, and its use is optional. The <:cache:> tag here says that this component should be cached for five hours. Let this be index.html:
<:component /comp/welcome.comp
name="Hagrid"
message="Get off that log, %(name)s!"
cache="yes":>
or the equivalent index.py:
import AE
s=AE.Component.callComponent('/comp/welcome.comp',
argDict={'name' : 'Hagrid',
'message' : 'Get off that log, %(name)s!'},
compType=AE.Component.DT_REGULAR,
cache=AE.Component.YES)
print s
The <:component:> tag passes the parameters name and message to the component, and also asks for the component to be retrieved from the cache if the cached output is available. The output will be, if you first execute it at 4 in the afternoon on January 5th, 2003:
Get off that log, Hagrid! Sun Jan 05 16:00:00 2003
and if you call it with the same arguments, including the cache argument, an hour later, you will get the same result, since the output will still be cached. There are other values to the cache argument that you can also pass to the component tag: defer (which causes the component to be executed and the cache renewed when it expires after the response has been sent to the client), old (which returns cached content even if the cache has expired), force (which forces the component to be executed and the cache renewed regardless of the contents of the cache), and no (the default, which means do not go to the cache and do not populate it). The <:cache:> tag, used inside components, is also highly configurable, as is documented in the STML Reference; and you can accomplish the same thing in Python by assigning a timestamp value (number of seconds since Jan. 1, 1970) to a variable called __expiration__ inside the component. (The Python module Date.TimeUtil contains two convenient functions, convertDuration and convertUntil, for computing that value.)
The output of data components is not appended to the http response; instead, data components return a Python object by raising a particular exception, ReturnValue. Like regular components, they are executed in their own namespace and accept component arguments, and are cacheable. This means that data components are essentially memoized function calls.
Since data components return Python data, it is usually better to write them in Python, but that isn't necessary. Here is an STML data component:
<:#:>/comp/moveon.dcmp<:/#:> <:cache duration="1h":> <:spool tmp:> <:url "http://www.moveon.org" text="donate to this" noescape="1":> <:/spool:> <:return `tmp`:>
The <:url:> tag, which we haven't seen before, is one of a handful of HTML helper tags available in STML; in this case it generates an html anchor tag with a link to the given url around the text argument. The <:return:> tag raises the ReturnValue exception with an argument value of the string tmp that was created by the <:spool:> tag. Here is a component, show_text.comp, that calls this data component and displays the data returned:
<:datacomp astring /comp/moveon.dcmp cache="defer":> <:val `astring`:>
The <:datacomp:> tag calls the component specified in the second argument and assigns the return value to the variable named in the first argument.
Here is a Python data component:
# /comp/series.pydcmp # returns random 12-tone series import random s=range(1,12) random.shuffle(s) raise ReturnValue, [0] + s
and a Python component calling it:
# /comp/show_series.pycomp
import AE
s=AE.Component.callComponent('/comp/series.pydcmp',
argDict={},
compType=AE.Component.DT_DATA)
# do something with s
By experimenting with these elements in combination, with the STML Reference close at hand, you should soon get an idea of how write a simple SkunkWeb application. The next article in this series will delve into the guts of extending and configuring SkunkWeb, and show how to write a more involved application, with real-world features like authorization and database access.
| [1] | SkunkWeb's home page: http://skunkweb.sourceforge.net/ |
| [2] | Python Wiki Web Programming page: http://www.python.org/cgi-bin/moinmoin/WebProgramming |
| [3] | http://www.egenix.com/ |
| [4] | http://www.pythonware.com/ |
| [5] | STML Reference Guide: http://skunkweb.sourceforge.net/stmlrefer/. |
Jacob Smullyan is a classical pianist and software developer; his fingers rarely get a rest. He programs nowadays mainly in Python, and is the current maintainer of SkunkWeb, which he is lucky to use for his day job, developing the website for WNYC Public Radio in New York City. He can be contacted at smulloni@smullyan.org.