Accessing open access repositories using the python oaipmh package
October 6th, 2006
The Open Access Initiative Protocol for Metadata Harvesting (OAIPMH) is growing rapidly as the standard web protocol for making metadata, primarily bibliographic information, available online for programmatic access and I’ve long meant to write something that would allow be to pull information down from remote repositories into my local bibliographic database automatically (it would save an awful lot of typing).
I’ve mentioned the oaipmh package provided by infrae.com before however the documentation they provide has got rather out of date and though I’ve made a few attempts I’ve never quite been able to get it to work. However after a bit more effort recently with the newer v2.0+ of the package I’ve managed to get something basic working which you can find at http://www.rufuspollock.org/code/oaipmh/demo.py.
I should note that my main interest, at least at present, is in the client-side, not the server-side of oaipmh so the code is oriented in that direction — as I mentioned above my aim is to automatically pull down article metadata into my local bibliographic system from sites such as repec (repec oai url).
WSGI Middleware
September 28th, 2006
WSGI Middleware
In a previous tutorial we just wrote a basic ‘Hello World’ application in WSGI. At the end of you might, rightly, have been wondering what’s the point of WSGI — after all you could have written that ‘Hello World’ app using plain CGI (or anything else for that matter). In this tutorial we are going to start answering that question by taking a look at WSGI middleware and write a simple piece of middleware ourselves.
A Simple Example
Here a simple piece of middleware that adds authentication based on the remote address of the client (this tutorial and its code is available in raw form at http://www.rufuspollock.org/code/wsgi/):
from wsgiref.simple_server import make_server, demo_app
class AuthenticationMiddleware:
"""A modified version of an original example at:
http://isapi-wsgi.python-hosting.com/wiki/WSGI-Gateway-or-Glue
"""
def __init__(self, app, allowed_addresses):
"""
@param app: the WSGI app we will that comes after us
@param allowed_addresses: list of remote addresses from which to allow
access
"""
self.app = app
self.allowed_addresses = allowed_addresses
def __call__(self, environ, start_response):
"""The standard WSGI interface"""
addr = environ.get('REMOTE_ADDR','UNKNOWN')
if addr in self.allowed_addresses: # pass through to the next app
return self.app(environ, start_response)
else: # put up a response denied
start_response(
'403 Forbidden', [('Content-type', 'text/html')])
return ['You are forbidden to view this resource']
addresses = [ '127.0.0.1' ]
simple_app_with_auth = AuthenticationMiddleware(demo_app, addresses)
if __name__ == '__main__':
httpd = make_server('', 8000, simple_app_with_auth)
print "Serving HTTP on port 8000..."
# Respond to requests until process is killed
httpd.serve_forever()
The Basic Idea
As explained in [pep-333] the basic idea of middleware is of something that ‘plays both sides’:
Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such “middleware” components can perform such functions as:
- Routing a request to different application objects based on the target URL, after rewriting the environ accordingly. * Allowing multiple applications or frameworks to run side-by-side in the same process * Load balancing and remote processing, by forwarding requests and responses over a network * Perform content postprocessing, such as applying XSL stylesheets
A diagram helps:
WSGI SERVER
V A
V A
| |
| |
+---------------------+
| | | |
| +-------------+ |
| | V A | |
| | +-----+ | |
| | | APP | | |
| | +-----+ | |
| | MIDDLEWARE1 | |
| +-------------+ |
| MIDDLEWARE2 |
+---------------------+
The WSGI Application + Middleware 'Onion'
Basically middleware wraps an underlying wsgi application and then presents itself as the new wsgi application to external callers. In python code the above would like:
core_app = SomeWsgiApplication()
# remember the middleware is itself a wsgi application
wrapped_once = Middleware1(core_app)
# wrap the new wsgi application!
wrapped_twice = Middleware2(wrapped_once)
# alternatively we could do it all in one
wrapped = Middleware2(Middleware1(core_app))
Remarks
Middleware is useful because it dramatically increases the possibilities for using standard web application plumbing — any piece of middleware can now be plugged together very easily with either other middleware or an application.
Middleware is usually one of three types:
- pre-processors
- post-processors
- those that do both (rare)
Examples of pre-processors are:
- Authenticators (including session management)
- Dispatchers including proxies and controllers
Examples of post-processors:
- Applying an XSL style sheet
- Tidying html or providing safe xhtml
In general, pre-processors are a little simpler because they don’t have to deal with the ‘chunking’ aspect of WSGI (a WSGI application return an iterable rather than just a single buffer so as to allow ‘chunking’ of output — this will be useful, for example, when streaming large files, see the’Buffering and Streaming’ section in PEP 333 for more information).
‘Hello World’ with WSGI
August 31st, 2006
I’ve been seeing a lot of talk about WSGI (Web Server Gateway Interface) and its benefits over the last six months or so and I’ve been meaning to take a look — not least because of the potential to use wsgi middleware to make a nice front-controller for KForge.
First Stop
A quick google takes me to: http://www.wsgi.org/wsgi. I’m looking to just write the proverbial ‘hello world’ app at this stage. Most of the references are bit too high level (or complex) for me (though this one is an exception). So here I’m going to detail my experiences of familiarizing myself with wsgi by writing the classic ‘hello world’ app (if you looking to do something more sophisticated with wsgi check out a toolkit such as paste or pylons the framework built on top of paste).
Hello World
1. Install wsgiref
wsgiref is the wsgi reference implementation that is now part of python 2.5 standard library. If you are running python version less than 2.5 you will want to do:
$ sudo easy_install wsgiref
2. Get a web server
We’ll use the wsgiref simple server as detailed in the docs (if you want to use a ‘proper’ webserver see the section below on making your wsgi app available via fastcgi). Create a python module, simpletest.py say, and insert:
from wsgiref.simple_server import make_server, demo_app
httpd = make_server('', 8000, demo_app)
print "Serving HTTP on port 8000..."
# Respond to requests until process is killed
httpd.serve_forever()
# Alternative: serve one request, then exit
##httpd.handle_request()
3. Run it
Start the server:
$ python simpletest.py
Then visit http://localhost:8000/
Bingo! We’ve got our first working wsgi app (demo_app should output ‘Hello world!’ followed by a list of variable values).
4. Make our own Hello World app
We haven’t yet written anything ourselves — we’re just using the demo_app bundled with wsgiref. So change simpletest.py to be:
def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['My Own Hello World!\n']
from wsgiref.simple_server import make_server, demo_app
httpd = make_server('', 8000, simple_app)
print "Serving HTTP on port 8000..."
# Respond to requests until process is killed
httpd.serve_forever()
Run this and visit http://localhost:8000/ and you should see a blank page containing ‘My Own Hello World!’.
5. Using a Class
Finally for completeness here’s the same application but done as a class:
class SimpleApp:
"""Produce the same output, but using a class
"""
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
status = '200 OK'
response_headers = [('Content-type','text/plain')]
self.start(status, response_headers)
yield 'My Own Hello world!\n'
from wsgiref.simple_server import make_server, demo_app
# httpd = make_server('', 8000, simple_app)
# the same but using a class
httpd = make_server('', 8000, SimpleApp)
print "Serving HTTP on port 8000..."
# Respond to requests until process is killed
httpd.serve_forever()
Serving an WSGI App via FastCGI
This section explains how to serve your WSGI app via FastCGI (other methods using scgi or even cgi take an almost identical approach).
1. Install a fastcgi interface to wsgi:
Use flup which provides a fastcgi and scgi interface to wsgi:
$ sudo easy_install flup
2. Install a simple standalone fastcgi implementation:
- Download http://www.saddi.com/software/py-lib/py-lib/fcgi.py
- Install this somewhere you can import it as import fcgi
3. Attach your wsgi application to this fcgi server
Create a python file (server.fcgi) and paste in the following:
#!/usr/bin/env python
from myapplication import app # Assume app is your WSGI application object
from fcgi import WSGIServer
WSGIServer(app).run()
Now you can just point your webserver at this file (make sure you’ve configured it to handle .fcgi files using fastcgi) and your app is available via fastcgi.
