The Open Access Initiative Protocol for Metadata Harvesting (OAIPMH) is growing rapidly as the standard web protocol for making metadata, primarily bibliographic information, available online for programmatic access and I’ve long meant to write something that would allow be to pull information down from remote repositories into my local bibliographic database automatically (it would save an awful lot of typing).

I’ve mentioned the oaipmh package provided by infrae.com before however the documentation they provide has got rather out of date and though I’ve made a few attempts I’ve never quite been able to get it to work. However after a bit more effort recently with the newer v2.0+ of the package I’ve managed to get something basic working which you can find at http://www.rufuspollock.org/code/oaipmh/demo.py.

I should note that my main interest, at least at present, is in the client-side, not the server-side of oaipmh so the code is oriented in that direction — as I mentioned above my aim is to automatically pull down article metadata into my local bibliographic system from sites such as repec (repec oai url).

WSGI Middleware

September 28th, 2006

WSGI Middleware

In a previous tutorial we just wrote a basic ‘Hello World’ application in WSGI. At the end of you might, rightly, have been wondering what’s the point of WSGI — after all you could have written that ‘Hello World’ app using plain CGI (or anything else for that matter). In this tutorial we are going to start answering that question by taking a look at WSGI middleware and write a simple piece of middleware ourselves.

A Simple Example

Here a simple piece of middleware that adds authentication based on the remote address of the client (this tutorial and its code is available in raw form at http://www.rufuspollock.org/code/wsgi/):


from wsgiref.simple_server import make_server, demo_app

class AuthenticationMiddleware:
    """A modified version of an original example at:
    http://isapi-wsgi.python-hosting.com/wiki/WSGI-Gateway-or-Glue
    """

    def __init__(self, app, allowed_addresses):
        """
        @param app: the WSGI app we will that comes after us
        @param allowed_addresses: list of remote addresses from which to allow
                                  access
        """
        self.app = app
        self.allowed_addresses = allowed_addresses

    def __call__(self, environ, start_response):
        """The standard WSGI interface"""
        addr = environ.get('REMOTE_ADDR','UNKNOWN') 

        if addr in self.allowed_addresses: # pass through to the next app
            return self.app(environ, start_response)
        else: # put up a response denied
            start_response(
                '403 Forbidden', [('Content-type', 'text/html')])
            return ['You are forbidden to view this resource']

addresses = [ '127.0.0.1' ]
simple_app_with_auth = AuthenticationMiddleware(demo_app, addresses)

if __name__ == '__main__': 

    httpd = make_server('', 8000, simple_app_with_auth)
    print "Serving HTTP on port 8000..."

    # Respond to requests until process is killed
    httpd.serve_forever()

The Basic Idea

As explained in [pep-333] the basic idea of middleware is of something that ‘plays both sides’:

Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such “middleware” components can perform such functions as:

  • Routing a request to different application objects based on the target URL, after rewriting the environ accordingly. * Allowing multiple applications or frameworks to run side-by-side in the same process * Load balancing and remote processing, by forwarding requests and responses over a network * Perform content postprocessing, such as applying XSL stylesheets

A diagram helps:

             WSGI SERVER

               V   A
               V   A
               |   |
               |   |
      +---------------------+
      |        |   |        |
      |   +-------------+   |
      |   |    V   A    |   |
      |   |   +-----+   |   |
      |   |   | APP |   |   |
      |   |   +-----+   |   |
      |   | MIDDLEWARE1 |   |
      |   +-------------+   |
      |     MIDDLEWARE2     |
      +---------------------+

   The WSGI Application + Middleware 'Onion'

Basically middleware wraps an underlying wsgi application and then presents itself as the new wsgi application to external callers. In python code the above would like:

core_app = SomeWsgiApplication()
# remember the middleware is itself a wsgi application
wrapped_once = Middleware1(core_app)
# wrap the new wsgi application!
wrapped_twice = Middleware2(wrapped_once)

# alternatively we could do it all in one
wrapped = Middleware2(Middleware1(core_app))

Remarks

Middleware is useful because it dramatically increases the possibilities for using standard web application plumbing — any piece of middleware can now be plugged together very easily with either other middleware or an application.

Middleware is usually one of three types:

  • pre-processors
  • post-processors
  • those that do both (rare)

Examples of pre-processors are:

  • Authenticators (including session management)
  • Dispatchers including proxies and controllers

Examples of post-processors:

In general, pre-processors are a little simpler because they don’t have to deal with the ‘chunking’ aspect of WSGI (a WSGI application return an iterable rather than just a single buffer so as to allow ‘chunking’ of output — this will be useful, for example, when streaming large files, see the’Buffering and Streaming’ section in PEP 333 for more information).

‘Hello World’ with WSGI

August 31st, 2006

I’ve been seeing a lot of talk about WSGI (Web Server Gateway Interface) and its benefits over the last six months or so and I’ve been meaning to take a look — not least because of the potential to use wsgi middleware to make a nice front-controller for KForge.

First Stop

A quick google takes me to: http://www.wsgi.org/wsgi. I’m looking to just write the proverbial ‘hello world’ app at this stage. Most of the references are bit too high level (or complex) for me (though this one is an exception). So here I’m going to detail my experiences of familiarizing myself with wsgi by writing the classic ‘hello world’ app (if you looking to do something more sophisticated with wsgi check out a toolkit such as paste or pylons the framework built on top of paste).

Hello World

1. Install wsgiref

wsgiref is the wsgi reference implementation that is now part of python 2.5 standard library. If you are running python version less than 2.5 you will want to do:

  $ sudo easy_install wsgiref

2. Get a web server

We’ll use the wsgiref simple server as detailed in the docs (if you want to use a ‘proper’ webserver see the section below on making your wsgi app available via fastcgi). Create a python module, simpletest.py say, and insert:

  from wsgiref.simple_server import make_server, demo_app

  httpd = make_server('', 8000, demo_app)
  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

  # Alternative: serve one request, then exit
  ##httpd.handle_request()

3. Run it

Start the server:

  $ python simpletest.py

Then visit http://localhost:8000/

Bingo! We’ve got our first working wsgi app (demo_app should output ‘Hello world!’ followed by a list of variable values).

4. Make our own Hello World app

We haven’t yet written anything ourselves — we’re just using the demo_app bundled with wsgiref. So change simpletest.py to be:

  def simple_app(environ, start_response):
      """Simplest possible application object""" 
      status = '200 OK'
      response_headers = [('Content-type','text/plain')]
      start_response(status, response_headers)
      return ['My Own Hello World!\n']

  from wsgiref.simple_server import make_server, demo_app

  httpd = make_server('', 8000, simple_app)
  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

Run this and visit http://localhost:8000/ and you should see a blank page containing ‘My Own Hello World!’.

5. Using a Class

Finally for completeness here’s the same application but done as a class:

  class SimpleApp:
      """Produce the same output, but using a class
      """
      def __init__(self, environ, start_response):
          self.environ = environ
          self.start = start_response

      def __iter__(self):
          status = '200 OK'
          response_headers = [('Content-type','text/plain')]
          self.start(status, response_headers)
          yield 'My Own Hello world!\n'

  from wsgiref.simple_server import make_server, demo_app

  # httpd = make_server('', 8000, simple_app)
  # the same but using a class
  httpd = make_server('', 8000, SimpleApp)

  print "Serving HTTP on port 8000..."

  # Respond to requests until process is killed
  httpd.serve_forever()

Serving an WSGI App via FastCGI

This section explains how to serve your WSGI app via FastCGI (other methods using scgi or even cgi take an almost identical approach).

1. Install a fastcgi interface to wsgi:

Use flup which provides a fastcgi and scgi interface to wsgi:

  $ sudo easy_install flup

2. Install a simple standalone fastcgi implementation:

  1. Download http://www.saddi.com/software/py-lib/py-lib/fcgi.py
  2. Install this somewhere you can import it as import fcgi

3. Attach your wsgi application to this fcgi server

Create a python file (server.fcgi) and paste in the following:

  #!/usr/bin/env python
  from myapplication import app # Assume app is your WSGI application object
  from fcgi import WSGIServer
  WSGIServer(app).run()

Now you can just point your webserver at this file (make sure you’ve configured it to handle .fcgi files using fastcgi) and your app is available via fastcgi.

References