using-twistedweb.xhtml   [plain text]


<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Configuring and Using the Twisted.Web Server</title>
</head>

<body>
<h1>Configuring and Using the Twisted.Web Server</h1>

<h2>Twisted Web Development</h2><a name="development" />

<p>Twisted Web serves python objects that implement the interface 
IResource.</p>

<img src="../img/twisted_1.gif" alt="Twisted Web process" />

<h3>Main Concepts</h3>

<ul>

<li><a href="#sites">Site Objects</a> are responsible for creating
<code>HTTPChannel</code> instances to parse the HTTP request, and begin the object lookup process. They contain the root Resource, the resource which represents the URL <code>/</code> on the site.</li>

<li><a href="#resources">Resource</a> objects represent a single URL segment. The <code class="API" base="twisted.web.resource">IResource</code> interface describes the methods a Resource object must implement in order to participate in the object publishing process.</li>

<li><a href="#trees">Resource trees</a> are arrangements of Resource objects into a Resource tree. Starting at the root Resource object, the tree of Resource objects defines the URLs which will be valid.</li>

<li><a href="#rpys">.rpy scripts</a> are python scripts which the twisted.web static file server will execute, much like a CGI. However, unlike CGI they must create a Resource object which will be rendered when the URL is visited.</li>

<li><a href="#rendering">Resource rendering</a> occurs when Twisted Web locates a leaf Resource object. A Resource can either return an html string or write to the request object.</li>

<li><a href="#sessions">Session</a> objects allow you to store information across multiple requests. Each individual browser using the system has a unique Session instance.</li>

</ul>

<h3>Site Objects</h3>
<a name="sites" />

<p>Site objects serve as the glue between a port to listen for HTTP requests on, and a root Resource object.</p>

<p>When using <code>mktap web --path /foo/bar/baz</code>, a Site object is created with a root Resource that serves files out of the given path.</p>

<p>You can also create a <code>Site</code> instance by hand, passing it a
<code>Resource</code> object which will serve as the root of the site:</p>

<pre class="python">
from twisted.web import server, resource
from twisted.internet import reactor

class Simple(resource.Resource):
    isLeaf = True
    def render_GET(self, request):
        return "&lt;html&gt;Hello, world!&lt;/html&gt;"

site = server.Site(Simple())
reactor.listenTCP(8080, site)
reactor.run()
</pre>

<h3>Resource objects</h3>
<a name="resources" />

<p><code>Resource</code> objects represent a single URL segment of a site. During URL parsing, <code>getChild</code> is called on the current <code>Resource</code> to produce the next <code>Resource</code> object.</p>

<p>When the leaf Resource is reached, either because there were no more URL segments or a Resource had isLeaf set to True, the leaf Resource is rendered by calling <code>render(request)</code>. See <q>Resource Rendering</q> below for more about this.</p>

<p>During the Resource location process, the URL segments which have already been processed and those which have not yet been processed are available in <code>request.prepath</code> and <code>request.postpath</code>.</p>

<p>A Resource can know where it is in the URL tree by looking at <code>request.prepath</code>, a list of URL segment strings.</p>

<p>A Resource can know which path segments will be processed after it by looking at <code>request.postpath</code>.</p>

<p>If the URL ends in a slash, for example <code>http://example.com/foo/bar/</code>, the final URL segment will be an empty string. Resources can thus know if they were requested with or without a final slash.</p>

<p>Here is a simple Resource object:</p>

<pre class="python">
from twisted.web.resource import Resource

class Hello(Resource):
    def getChild(self, name, request):
        if name == '':
            return self
        return Resource.getChild(
            self, name, request)

        def render_GET(self, request):
            return """&lt;html&gt;
      Hello, world! I am located at %r.
    &lt;/html&gt;""" % (request.prepath) 

resource = Hello()
</pre>

<h3>Resource Trees</h3>
<a name="trees" />

<p>Resources can be arranged in trees using <code>putChild</code>. <code>putChild</code> puts a Resource instance into another Resource instance, making it available at the given path segment name:</p>

<pre class="python">
root = Hello()
root.putChild('fred', Hello())
root.putChild('bob', Hello())
</pre>

<p>If this root resource is served as the root of a Site instance, the following URLs will all be valid:</p>

<ul>
<li><code>http://example.com/</code></li>
<li><code>http://example.com/fred</code></li>
<li><code>http://example.com/bob</code></li>
<li><code>http://example.com/fred/</code></li>
<li><code>http://example.com/bob/</code></li>

</ul>

<h3>.rpy scripts</h3>
<a name="rpys" />

<p>Files with the extension <code>.rpy</code> are python scripts which, when placed in a directory served by Twisted Web, will be executed when visited through the web.</p>

<p>An <code>.rpy</code> script must define a variable, <code>resource</code>, which is the Resource object that will render the request.</p>

<p><code>.rpy</code> files are very convenient for rapid development and prototyping. Since they are executed on every web request, defining a Resource subclass in an <code>.rpy</code> will make viewing the results of changes to your class visible simply by refreshing the page:</p>

<pre class="python">
from twisted.web.resource import Resource

class MyResource(Resource):
    def render_GET(self, request):
        return "&lt;html&gt;Hello, world!&lt;/html&gt;"

resource = MyResource()
</pre>

<p>However, it is often a better idea to define Resource subclasses in Python modules. In order for changes in modules to be visible, you must either restart the Python process, or reload the module:</p>

<pre class="python">
import myresource

## Comment out this line when finished debugging
reload(myresource)

resource = myresource.MyResource()
</pre>

<p>Creating a Twisted Web server which serves a directory is easy:</p>

<pre class="shell">
% mktap web --path /Users/dsp/Sites
% twistd -nf web.tap
</pre>

<h3>Resource rendering</h3>
<a name="rendering" />

<p>Resource rendering occurs when Twisted Web locates a leaf Resource object to handle a web request. A Resource's <code>render</code> method may do various things to produce output which will be sent back to the browser:</p>

<ul>
<li>Return a string</li>
<li>Call <code>request.write("stuff")</code> as many times as desired, then call <code>request.finish()</code> and return <code>server.NOT_DONE_YET</code> (This is deceptive, since you are in fact done with the request, but is the correct way to do this)</li>

<li>Request a <code>Deferred</code>, return <code>server.NOT_DONE_YET</code>, and call <code>request.write("stuff")</code> and <code>request.finish()</code> later, in a callback on the <code>Deferred</code>.</li>
</ul>

<p>

The <code class="API" base="twisted.web.resource">Resource</code>
class, which is usually what one's Resource classes subclass, has a
convenient default implementation of <code
class="python">render</code>. It will call a method named <code
class="python">self.render_METHOD</code> where <q>METHOD</q> is
whatever HTTP method was used to request this resource. Examples:
request_GET, request_POST, request_HEAD, and so on. It is recommended
that you have your resource classes subclass <code class="API"
base="twisted.web.resource">Resource</code> and implement <code
class="python">render_METHOD</code> methods as opposed to <code
class="python">render</code> itself. Note that for certain resources,
<code class="python">request_POST = request_GET</code> may be
desirable in case one wants to process arguments passed to the
resource regardless of whether they used GET
(<code>?foo=bar&amp;baz=quux</code>, and so forth) or POST.

</p>

<h3>Session</h3>
<a name="sessions" />

<p>HTTP is a stateless protocol; every request-response is treated as an individual unit, distinguishable from any other request only by the URL requested. With the advent of Cookies in the mid nineties, dynamic web servers gained the ability to distinguish between requests coming from different <em>browser sessions</em> by sending a Cookie to a browser. The browser then sends this cookie whenever it makes a request to a web server, allowing the server to track which requests come from which browser session.</p>

<p>Twisted Web provides an abstraction of this browser-tracking behavior called the <em>Session object</em>. Calling <code>request.getSession()</code> checks to see if a session cookie has been set; if not, it creates a unique session id, creates a Session object, stores it in the Site, and returns it. If a session object already exists, the same session object is returned. In this way, you can store data specific to the session in the session object.</p>

<img src="../img/twisted_2.gif" />

<h2>Advanced Configuration</h2>

<p>Non-trivial configurations of Twisted Web are achieved with Python
configuration files. This is a Python snippet which builds up a variable
called application. Usually, a <code class="API">twisted.application.internet.TCPServer</code> instance will
be used to make the application listen on a TCP port (80, in case direct
web serving is desired), with the listener being a
<code class="API">twisted.web.server.Site</code>. The resulting file can then
be run with <code class="shell">twistd -y</code>. Alternatively a reactor
object can be used directly to make a runnable script.</p>

<p>The <code>Site</code> will wrap a <code>Resource</code> object -- the
root.</p>

<pre class="python">
from twisted.application import internet, service
from twisted.web import static, server

root = static.File("/var/www/htdocs")
application = service.Application('web')
site = server.Site(root)
sc = service.IServiceCollection(application)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<p>Most advanced configurations will be in the form of tweaking the
root resource object.</p>

<h3>Adding Children</h3>

<p>Usually, the root's children will be based on the filesystem's contents.
It is possible to override the filesystem by explicit <code>putChild</code>
methods.</p>

<p>Here are two examples. The first one adds a <code>/doc</code> child
to serve the documentation of the installed packages, while the second
one adds a <code>cgi-bin</code> directory for CGI scripts.</p>

<pre class="python">
from twisted.internet import reactor
from twisted.web import static, server

root = static.File("/var/www/htdocs")
root.putChild("doc", static.File("/usr/share/doc"))
reactor.listenTCP(80, server.Site(root))
reactor.run()
</pre>

<pre class="python">
from twisted.internet import reactor
from twisted.web import static, server, twcgi

root = static.File("/var/www/htdocs")
root.putChild("cgi-bin", twcgi.CGIDirectory("/var/www/cgi-bin"))
reactor.listenTCP(80, server.Site(root))
reactor.run()
</pre>

<h3>Modifying File Resources</h3>

<p><code>File</code> resources, be they root object or children thereof,
have two important attributes that often need to be modified:
<code>indexNames</code> and <code>processors</code>. <code>indexNames</code>
determines which files are treated as <q>index files</q> -- served
up when a directory is rendered. <code>processors</code> determine how
certain file extensions are treated.</p>

<p>Here is an example for both, creating a site where all <code>.rpy</code>
extensions are Resource Scripts, and which renders directories by
searching for a <code>index.rpy</code> file.</p>

<pre class="python">
from twisted.application import internet, service
from twisted.web import static, server, script

root = static.File("/var/www/htdocs")
root.indexNames=['index.rpy']
root.processors = {'.rpy': script.ResourceScript}
application = service.Application('web')
sc = service.IServiceCollection(application)
site = server.Site(root)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<p><code>File</code> objects also have a method called <code>ignoreExt</code>.
This method can be used to give extension-less URLs to users, so that
implementation is hidden. Here is an example:</p>

<pre class="python">
from twisted.application import internet, service
from twisted.web import static, server, script

root = static.File("/var/www/htdocs")
root.ignoreExt(".rpy")
root.processors = {'.rpy': script.ResourceScript}
application = service.Application('web')
sc = service.IServiceCollection(application)
site = server.Site(root)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<p>Now, a URL such as <code>/foo</code> might be served from a Resource
Script called <code>foo.rpy</code>, if no file by the name of <code>foo</code>
exists.</p>

<h3>Virtual Hosts</h3>

<p>Virtual hosting is done via a special resource, that should be
used as the root resource -- <code>NameVirtualHost</code>.
<code>NameVirtualHost</code> has an attribute named <code>default</code>,
which holds the default website. If a different root for some other
name is desired, the <code>addHost</code> method should be called.</p>

<pre class="python">
from twisted.application import internet, service
from twisted.web import static, server, vhost, script

root = vhost.NameVirtualHost()

# Add a default -- htdocs
root.default=static.File("/var/www/htdocs")

# Add a simple virtual host -- foo.com
root.addHost("foo.com", static.File("/var/www/foo"))

# Add a simple virtual host -- bar.com
root.addHost("bar.com", static.File("/var/www/bar"))

# The "baz" people want to use Resource Scripts in their web site
baz = static.File("/var/www/baz")
baz.processors = {'.rpy': script.ResourceScript}
baz.ignoreExt('.rpy')
root.addHost('baz', baz)

application = service.Application('web')
sc = service.IServiceCollection(application)
site = server.Site(root)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<h3>Advanced Techniques</h3>

<p>Since the configuration is a Python snippet, it is possible to
use the full power of Python. Here are some simple examples:</p>

<pre class="python">
# No need for configuration of virtual hosts -- just make sure
# a directory /var/vhosts/&lt;vhost name&gt; exists:
from twisted.web import vhost, static, server
from twisted.application import internet, service

root = vhost.NameVirtualHost()
root.default = static.File("/var/www/htdocs")
for dir in os.listdir("/var/vhosts"):
    root.addHost(dir, static.File(os.path.join("/var/vhosts", dir)))

application = service.Application('web')
sc = service.IServiceCollection(application)
site = server.Site(root)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<pre class="python">
# Determine ports we listen on based on a file with numbers:
from twisted.web import vhost, static, server
from twisted.application import internet, service

root = static.File("/var/www/htdocs")

site = server.Site(root)
application = service.Application('web')
serviceCollection = service.IServiceCollection(application)

for num in map(int, open("/etc/web/ports").read().split()):
    serviceCollection.addCollection(internet.TCPServer(num, site))
</pre>


<h2>Installing a pre-configured server</h2>

<p>In many cases, you'll end up repeating common usage patterns of twisted.web. In those cases you'll probably want to use Twisted's pre-configured web server setup.</p>

<p> To install the Twisted.Web server, you'll need to have
installed Twisted.</p>

<p> Pre-configured Twisted servers, like the web server, do not have configuration files.  Instead, you instantiate the server and store it into a 'Pickle' file, <code>web.tap</code>.  This file will then be loaded by the Twisted Daemon.   </p>

<pre class="shell">
% mktap web --path /path/to/web/content
</pre>

<p> If you just want to serve content from your own home directory, the following will do:  </p>

<pre class="shell">
% mktap web --path ~/public_html/
</pre>

<p> Some other configuration options are available as well:  </p>

<ul>
  <li> <code>--port</code>: Specify the port for the web 
       server to listen on.  This defaults to 8080.  </li>
  <li> <code>--logfile</code>: Specify the path to the
       log file. </li>
</ul>

<p> The full set of options that are available can be seen with:  </p>

<pre class="shell">
% mktap web --help
</pre>


<h2>Using Twisted.Web</h2>

<h3>Stopping and Starting the Server</h3>

<p> Once you've created your <code>web.tap</code> file and done any configuration, you can start the server:  </p>

<pre class="shell">
% twistd -f web.tap
</pre>

<p> You can stop the server at any time by going back to the directory you started it in and running the command: </p>

<pre class="shell">
% kill `cat twistd.pid`
</pre>

<h3>Serving Flat HTML</h3>

<p> Twisted.Web serves flat HTML files just as it does any other flat file.  </p>

<a name="ResourceScripts" />
<h3>Resource Scripts</h3>

<p> A Resource script is a Python file ending with the extension <code>.rpy</code>, which is required to create an instance of a (subclass of a) <code class="API">twisted.web.resource.Resource</code>. </p>

<p> Resource scripts have 3 special variables: </p>

<ul>
  <li> <code class="py-src-identifier">__file__</code>: The name of the .rpy file, including the full path.  This variable is automatically defined and present within the namespace.  </li>
  <li> <code class="py-src-identifier">registry</code>: An object of class <code class="API" base="twisted.web">static.Registry</code>. It can be used to access and set persistent data keyed by a class.</li>
  <li> <code class="py-src-identifier">resource</code>: The variable which must be defined by the script and set to the resource instance that will be used to render the page. </li>
</ul>

<p> A very simple Resource Script might look like:  </p>

<pre class="python">
from twisted.web import resource
class MyGreatResource(resource.Resource):
    def render_GET(self, request):
        return "&lt;html&gt;foo&lt;/html&gt;"

resource = MyGreatResource()
</pre>

<p> A slightly more complicated resource script, which accesses some
persistent data, might look like:</p>

<pre class="python">
from twisted.web import resource
from SillyWeb import Counter

counter = registry.getComponent(Counter)
if not counter:
   registry.setComponent(Counter, Counter())
counter = registry.getComponent(Counter)

class MyResource(resource.Resource):
    def render_GET(self, request):
        counter.increment()
        return "you are visitor %d" % counter.getValue()

resource = MyResource()
</pre>

<p> This is assuming you have the <code>SillyWeb.Counter</code> module,
implemented something like the following:</p>

<pre class="python">
class Counter:

    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1

    def getValue(self):
        return self.value
</pre>

<h3>Web UIs</h3>

<p>
The 
<a href="http://www.divmod.org/Home/Projects/Nevow/">Nevow</a> framework, available as
part of the <a href="http://www.divmod.org/Quotient/">Quotient</a> project,
is an advanced system for giving Web UIs to your application. Nevow uses Twisted Web but is
not itself part of Twisted.
</p>

<a name="SpreadableWebServers" />
<h3>Spreadable Web Servers</h3>

<p> One of the most interesting applications of Twisted.Web is the distributed webserver; multiple servers can all answer requests on the same port, using the <code class="API">twisted.spread</code> package for <q>spreadable</q> computing.  In two different directories, run the commands:  </p>

<pre class="shell">
% mktap web --user
% mktap web --personal [other options, if you desire]
</pre>

<p> Both of these create a <code>web.tap</code>; you need to run both at the same time.  Once you have, go to <code>http://localhost:8080/your_username.twistd/</code> -- you will see the front page from the server you created with the <code>--personal</code> option.  What's happening here is that the request you've sent is being relayed from the central (User) server to your own (Personal) server, over a PB connection.  This technique can be highly useful for small <q>community</q> sites; using the code that makes this demo work, you can connect one HTTP port to multiple resources running with different permissions on the same machine, on different local machines, or even over the internet to a remote site.  </p>


<h3>Serving PHP/Perl/CGI</h3>

<p>Everything related to CGI is located in the
<code>twisted.web.twcgi</code>, and it's here you'll find the classes that you
need to subclass in order to support the language of your (or somebody elses)
taste. You'll also need to create your own kind of resource if you are using a
non-unix operatingsystem (such as Windows), or if the default resources has
wrong pathnames to the parsers.</p>

<p>The following snippet is a .rpy that serves perl-files. Look at <code>twisted.web.twcgi</code>
for more examples regarding twisted.web and CGI.</p>

<pre class="python">
from twisted.web import static, twcgi

class PerlScript(twcgi.FilteredScript):
    filter = '/usr/bin/perl' # Points to the perl parser

resource = static.File("/perlsite") # Points to the perl website
resource.processors = {".pl": PerlScript} # Files that end with .pl will be
                                          # processed by PerlScript
resource.indexNames = ['index.pl']
</pre>

<h3>Using VHostMonster</h3>

<p>It is common to use one server (for example, Apache) on a site with multiple
names which then uses reverse proxy (in Apache, via <code>mod_proxy</code>) to different
internal web servers, possibly on different machines. However, naive 
configuration causes miscommunication: the internal server firmly believes it
is running on <q>internal-name:port</q>, and will generate URLs to that effect,
which will be completely wrong when received by the client.</p>

<p>While Apache has the ProxyPassReverse directive, it is really a hack
and is nowhere near comprehensive enough. Instead, the recommended practice
in case the internal web server is Twisted.Web is to use VHostMonster.</p>

<p>From the Twisted side, using VHostMonster is easy: just drop a file named
(for example) <code>vhost.rpy</code> containing the following:</p>

<pre class="python">
from twisted.web import vhost
resource = vhost.VHostMonsterResource()
</pre>

<p>Of course, an equivalent <code>.trp</code> can also be used. Make sure
the web server is configured with the correct processors for the 
<code>rpy</code> or <code>trp</code> extensions (the web server 
<code>mktap web --path</code> generates by default is so configured).</p>

<p>From the Apache side, instead of using the following ProxyPass directive:</p>

<pre>
&lt;VirtualHost ip-addr&gt;
ProxyPass / http://localhost:8538/
ServerName example.com
&lt;/VirtualHost&gt;
</pre>

<p>Use the following directive:</p>

<pre>
&lt;VirtualHost ip-addr&gt;
ProxyPass / http://localhost:8538/vhost.rpy/http/example.com:80/
ServerName example.com
&lt;/VirtualHost&gt;
</pre>

<p>Here is an example for Twisted.Web's reverse proxy:</p>

<pre class="python">
from twisted.application import internet, service
from twisted.web import proxy, server, vhost
vhostName = 'example.com'
reverseProxy = proxy.ReverseProxyResource('internal', 8538,
                                          '/vhost.rpy/http/'+vhostName+'/')
root = vhost.NameVirtualHost()
root.addHost(vhostName, reverseProxy)
site = server.Site(root)
application = service.Application('web-proxy')
sc = service.IServiceCollection(application)
i = internet.TCPServer(80, site)
i.setServiceParent(sc)
</pre>

<h2>Rewriting URLs</h2>

<p>Sometimes it is convenient to modify the content of the 
<code class="API" base="twisted.web.server">Request</code> object
before passing it on. Because this is most often used to rewrite
either the URL, the similarity to Apache's <code>mod_rewrite</code> has
inspired the <code class="API">twisted.web.rewrite</code> module. Using
this module is done via wrapping a resource with a
<code class="API">twisted.web.rewrite.RewriterResource</code> which
then has rewrite rules. Rewrite rules are functions which accept a request
object, and possible modify it. After all rewrite rules run, the child
resolution chain continues as if the wrapped resource, rather than
the <code class="API" base="twisted.web.rewrite">RewriterResource</code>,
was the child.</p>

<p>Here is an example, using the only rule currently supplied by Twisted
itself:</p>

<pre class="python">
default_root = rewrite.RewriterResource(default, rewrite.tildeToUsers)
</pre>

<p>This causes the URL <code>/~foo/bar.html</code> to be treated like
<code>/users/foo/bar.html</code>. If done after setting default's
<code>users</code> child to a
<code class="API" base="twisted.web">distrib.UserDirectory</code>,
it gives a configuration similar to the classical configuration of
web server, common since the first NCSA servers.</p>

<h2>Knowing When We're Not Wanted</h2>

<p>Sometimes it is useful to know when the other side has broken the connection.
Here is an example which does that:</p>

<pre class="python">
from twisted.web.resource import Resource
from twisted.web import server
from twisted.internet import reactor
from twisted.python.util import println


class ExampleResource(Resource):

    def render_GET(self, request):
        request.write("hello world")
        d = request.notifyFinish()
        d.addCallback(lambda _: println("finished normally"))
        d.addErrback(println, "error")
        reactor.callLater(10, request.finish)
        return server.NOT_DONE_YET

resource = ExampleResource()
</pre>

<p>This will allow us to run statistics on the log-file to see how many users
are frustrated after merely 10 seconds.</p>

<h2>As-Is Serving</h2>

<p>Sometimes, you want to be able to send headers and status directly. While
you can do this with a
<code base="twisted.web.script" class="API">ResourceScript</code>, an easier
way is to use <code base="twisted.web.static" class="API">AsIsProcessor</code>.
Use it by, for example, addding it as a processor for the <code>.asis</code>
extension. Here is a sample file:</p>

<pre>
HTTP/1.0 200 OK
Content-Type: text/html

Hello world
</pre>

</body>
</html>