A new version (v1.2) of my python script for converting markdown to latex is now done. markdown2latex (renamed from mkdn2latex) has been extensively refactored to become a proper python-markdown extension. This means it can be used seemlessly alongside plain markdown conversion, as well as independently whether as a module or, in its classic form, from the command line.

In addition for ease of installation it has also been turned into a proper python package and registered on pypi so you can just do:

$ easy_install markdown2latex

Alternatively you can still get it straight from the repository at:

http://knowledgeforge.net/okftext/svn/trunk/python/markdown2latex/

Distributed versioning systems (VCMs) have now matured to the point that I’ve been planning to switch from subversion for quite a while — at least for own personal repositories where there are no coordination issues. Having chosen mercurial (hg) as my DVCM of choice the next step was to actually convert. While there is quite a bit of documentation on this topic available online I didn’t always find these had the necessary info. Combined with my experience of several ’snags’ along the way I thought it worth documenting my experience in case it proves useful to others.

I’d waited until hg 0.9.5 was available on my distro precisely because I wanted to use the hg convert functionality (alteratives such as tailor looked to have difficulties and though it turned out I could have used hgsvn without problem my original impression of it had been it was oriented for integration of hg and svn rather than straight conversion).

Before I docment the steps it is important to get clear one v. important thing about how hg works:

There is no distinction between a working copy and a repository.

In particular, each repository is a working copy and vice versa. The actual repo is stored inside the working copy at its root in a .hg directory. When you ‘checkout’ (svn terminology) you do so simply by ‘cloning’ an existing repository (or if just want a limited set of changes — e.g. those since you lasted ‘updated’ (svn terminology) you can do a ‘pull’). In fact you could even just make a plain copy that .hg directory and send it to someone — though obviously this might not work so well if you are moving between 2 OSs with different filesystems.

Anyway, the main point to take from this is that the result of an hg convert will simply be a new directory with all the files (the working copy) plus a .hg directory in that directory (the repo).

To convert all you do is::

$ hg convert <svn-repo-or-co> <some-new-directory>

The devil however is in the detail:

  1. svn-repo-or-co can be the uri of a subversion repo or the path of a svn checkout. Where a checkout hg convert will just work out the source repo and pull from there
    1. Note however hg convert will not move across working copy files themselves. The obvious solution to this is to do the convert and then just move the .hg file across into your svn checkout and delete all the .svn directories (or vice-versa)
  2. some-new-directory: this is where the new hg repo/working copy with end up.
  3. After doing hg convert rather surprisingly all of the files in the new hg repository will be listed as ‘?’ (not tracked) when you do a hg status. To solve this just do a hg update
  4. To speed up conversions it is often worth getting a local copy of the subversion repo (to save pulling lots of stuff over the network connection). To do this either use svnsync or just dump the remote repo and load into a local one (if converting from a working copy you’ll then just need to do a svn switch --relocate
  5. My repository did not have a branches/tags/trunk layout (instead it has multiples subprojects …). This led to weird errors involving files and directories at the root of the repository which looked like: ‘hg convert abort: path contains illegal component’. I solved this by using the --filemap option to hg convert and putting explicit renames of the form: /root-path-1 root-path-1 in that file.
  6. What do you for all the other working copies once you have converted the repo/your working copy? This is now simple:
    1. Clone your hg repo to each of the machines with a working copy.
      • For this purpose you will probably want to make your original hg repo available over the Internet using either ssh or http protocols (for details see mercurial docs).
        1. Copy over the svn working files into that new hg repo

I record briefly my experience resolving this issue in case it helps others. As background I note that I use svk to allow local commit and replay for some of the subversion repos I use and over the last week I’d started encountering problems when trying to svk sync on one of these receiving the following error message:

Bad URL passed to RA layer: Malformed URL for repository

The solution to this is the following patch provided by Peter Werner to the svk-devel list a few days ago:

-------------- next part --------------
--- SVN-Mirror-0.73.orig/lib/SVN/Mirror/Ra.pm 2007-03-19 23:59:12.000000000 +0100
+++ SVN-Mirror-0.73/lib/SVN/Mirror/Ra.pm  2007-10-07 08:37:36.000000000 +0200
@@ -168,6 +168,9 @@
     $self->{config} ||= SVN::Core::config_get_config(undef, $self->{pool});
     $self->{auth} ||= $self->_new_auth;

+    # escape URI (% is already escaped)
+    $arg{'url'} =~ s/([^-_:.%\/a-zA-Z0-9])/sprintf("%%%02X", ord($1))/eg if defined $arg{'url'};
+
     SVN::Ra->new( url => $self->{rsource},
      auth => $self->{auth},
      config => $self->{config},

In addition to this solution below I report the process by which I discovered it. I do this as it provides an interesting case study of the way that open source communities work, and particularly how ‘user-driven bug-fixing’ happens.

  1. Searching on the web turned up a variety of earlier reports [1][2][3] of this issue which it seemed related to having a spaces in svn url names (see [1.1] and [2] in particular). This seemed plausible as a source of the error as it occurred after someone had added a directory with spaces in it to the repository (a very rare occurrence).
  2. This issue did not seem to occur for all users and CLK (the maintainer) suggested upgrading SVN::Mirror to 0.73. [2.1]
  3. This I did but the bug was still there (as other users had noted [2.2]) however the source now seemed to be pinpointed as being in the SVN::Mirror perl module. Unfortunately I’m not a perl hacker …
  4. Finally a hand search of the svk lists turned up a post from less than a week ago [4] (obviously too recent for Google to have picked up yet as I had earlier done a specific search for the error name over the svk lists …). In addition to reporting the problem this mail provided a 2 liner patch to a specific perl module. I applied this patch, tried svk sync and hey presto! the bug was gone.

The issue progressed from an unconfirmed one whose aetiology was unclear [1], to a confirmed one whose cause was fairly well known [2] (though not its source in code), solutions were suggested and tested by users [2.1, 2.2], the issue remained unresolved for several more months with the fix eventually provided by an independent user to the list [4].

It is also especially noteworthy that much of this tracking down was only possible because the software involved was open enabling users to poke around to see what was wrong. For example, tying the bug to spaces in the underlying repository url resulted from the original reporter of the issue hand-modifying a svn source file so as to make the error message more verbose [1.1] — something which is clearly only possible if the code is open.

An (ongoing) summary of my experience with some of the utilities available for plotting from a python perspective.

Last updated: 2008-03-06

Ploticus

  • (+) Fast, powerful, mature, well-documented
  • (-) Not python based

C-based rather than python-based but fast and powerful. There is a (fairly crude) set of python bindings available here: http://www.srcc.lsu.edu/~davids/ploticus_module.html. Alternatively one can just call the ploticus command from a python script.

Matplotlib

  • (+) Fairly powerful, mature, well-documented, nice pure python API
  • (-) A little slow; requires a backend to be installed (so installation on a server is a problem)
  • Could support object-orientation better

PyChart

http://home.gna.org/pychart/

  • (+) Pure python, quite simple to use, good documentation
  • (-) Not quite as nice looking or as powerful as e.g. ploticus

Biggles

http://biggles.sourceforge.net/

  • last updated: 2004-03-08
  • looks fine but does not seem to be actively developed any longer

Example

See: http://home.gna.org/pychart/examples/index.html. This is the bar/line example from there:

bar/line chart

from pychart import *
theme.get_options()

data = [(10, 20, 30), (20, 65, 33),
    (30, 55, 30), (40, 45, 51),
    (50, 25, 27), (60, 75, 30)]

ar = area.T(size = (150,120),
            y_grid_interval=10,
            x_axis=axis.X(label="X label", label_offset=(0,-7)),
            y_axis=axis.Y(label="Y label"),
            legend = legend.T(), y_range = (0, None))

ar.add_plot(bar_plot.T(label="foo", data=data),
            line_plot.T(label="bar", data=data, ycol=2))
ar.draw()

Versioned Domain Models

March 22nd, 2007

I’ve been thinking about how to have a versioned domain model similar to the way we have versioned filesystems (e.g. subversion) for over two years. Over the last few months whatever bits of free time I’ve had have gone into developing a prototype built on top of sqlobject and I’ve now got a rough and ready (but fully functional) library:

http://project.knowledgeforge.net/ckan/svn/vdm/branches/sqlobj/

A demo of how it is used is best shown by the tests:

http://project.knowledgeforge.net/ckan/svn/vdm/branches/sqlobj/vdm/dm_test.py

Why be tied to SQLObject: obviously being so directly tied to sqlobject is not such a great thing but I intentionally chose to build on it because so many people will already be writing their domain models using SQLObject.

I’ve updated mkdn2latex the python script which converts markdown to latex (see also the original release announcement). Changes include:

  • Support for markdown code blocks and html pre/code blocks generally using latex verbatim
  • Verified compatibility with markdown 1.6
  • A few minor bugfixes

Thinking about Annotation

January 17th, 2007

Annotation means the adding of comments/notes/etc to an underlying resource. For the present I’ll focus on the situation where the underlying resource is textual (as opposed to being an image, or a piece of film or some data). Various things to consider when implementing an annotation/comment system:

  1. Addressing and atomisation: Are annotations specific to particular parts of the resource. If so how do we store this address (relatedly: how is the resource ‘atomised’ and how to we address these atoms, or range of atoms). For example, do we address by word, by character, by paragraph or by section? Do we wish to store ranges rather than a single address? Do we wish to allow a given annotation to be associated with multiple ranges/atoms?

  2. Permissions: Are there restrictions on the creation (deletion/updating etc) of annotations.

  3. Will the underlying resource change and if so are annotations intended to be robust to those changes.

Let’s concentrate on the first issue for the time being as it is the most immediately important. Furthermore, defining the ‘atoms’ of the resource sharply narrows the implementation options.

The Simple Case: Mod a Blog

If one is happy to have fairly large atoms (pages, or even sections of some piece of text) then implementing an annotation system can be reduced to grabbing your favourite CMS or blogging software and feeding the text in in appropriate chunks. This is often satisfactory and is a simple, low tech solution that will pretty much work out of the box. A classic example of this approach is http://www.pepysdiary.com/ which works so well because the subject matter (Samuel Pepy’s diary) has a very obvious atomisation (namely the daily diary entries) suited perfectly suited to blog software (in this case movable type).

You can even start doing a bit of modding, for example to present recent annotations (http://www.pepysdiary.com/recent/) or to present the text plus annotations all in one piece. (Given that commentonpower seems to fall neatly into this category with most commentable atoms of the right size for ‘blog’ entries I wonder why they didn’t just implement it as a plugin for wordpress — perhaps it was such a simple app that it easier to ‘roll their own’).

Getting More Atomic

Once you want to have atoms below a size comfortable for individual html pages/blog entries, wish to allow people to comment on chunks too large for an individual page, or to comment on ranges one starts to have problems with this approach. The main challenge at this point is to find some way to extract the addressing information from the client doing the annotation. Confining ourselves to the web the challenge becomes way to structure the interface and the text so that one can determine range start and end points. This is a non-trivial matter. Possible options include:

  • Javascript: in theory the selection/range objects should help us out here unfortunately cross-browser support is patch (firefox as usual is excellent and IE pretty bad). If one does not want to be as precise as to get ranges javascript could also be used to extract e.g. element ids.
  • Copy and paste of the quote to annotate with some backend algorithm to determine the actual range. Nice and simple but not clear that one can ‘invert’ (i.e. find a unique range from a given selection) unless the selection is large.
  • If addressing fairly large atoms (e.g. a paragraph or large) one could just insert a unique piece of user interface equipment (e.g. a button or link) with each atom. Note however that this prevents support for ranges.

Separating Data and Presentation

Whatever one chooses to do it does seem sensible to clearly separate data and presentation. This is particularly important when there is so much uncertainty over the user interface. In particular, it would be good to clearly specify the annotation format and implement a programmatic interface to it independent of the standard (human) user interface. That way is easy to switch interfaces (or have multiple ones). Given that annotations are essentially just a comment it would seem sensible to try and reuse an existing format such as Atom (or RSS) for the machine interface to the comment store. [marginalia] already had such a format based on atom. I’ve recently reimplemented a stripped down version of this format for the annotation store backend in python in preparation for adding annotation support to openshakespeare web interface, see:

http://project.knowledgeforge.net/shakespeare/svn/annotater/trunk/

Of course as discussed above this isn’t quite as simple as it looks as your user interface can constrain what you can and can’t store (using a blog approach you can’t store ranges and from what I have read getting reliable character offsets is problematic). Nevertheless it seems the best place to start.

Adding Mathematics to Markdown

January 8th, 2007

Following my release of the markdown to latex script I’ve had a few enquiries from people asking about integrating mathematics with markdown generally (e.g. for web output as well as for output to latex). I’d already been using mathematics in markdown and then processing to html before I wrote the mkdn2latex script and in a world where one didn’t need to produce nice pdfs for conferences and journals it would be my preferred format. Anyway here’s a summary of the ways in which you can add mathematics support to basic markdown:

Mathematics in Markdown Howto

There are two possible options for pure web output with mathematics using markdown:

  1. Add asciimathml/latexmathml support into the html files in which the markdown output will be inserted (these are javascript files to convert latex like mathematics to mathml on the fly see 1 and 2 — note that i recommend latexmathml as it is closer to latex).

  2. Convert to latex and then convert to html use latex2html or similar

For pure html work I’ve used approach (1) up until now. This requires no change to your markdown processor only that you link to the right asciimathml/latexmathml javascript in the resulting html document (you can see an example in this simple wrapper around the basic markdown script)

In both cases you will want to insert math sections into your source markdown file. My convention is that any maths whether in paragraph or out should be enclosed in double dollars as in: \$\$ …. \$\$ (note that the \ should not be there but because latexmathml script is being used on this blog we need to escape one of the $ so that the text actually displays — as opposed to being render as mathematics). This is slightly different from the standard asciimathml/latexmathml conventions which just use a single $). I’ve made the necessary modifications (very minor) to asciimathml and latexmathml and you can find them at:

http://knowledgeforge.net/okftext/svn/trunk/js/

(look in the src subdirectories)

To summarize:

  1. Create your markdown documents as normal.

  2. To add mathematics just add it as for latex but using $\$ as delimiters. (If you plan to use javascript approach read up on those scripts to see what parts of latex they support). For example this would be fine (again ignore the backslashes):

     A simple markdown file, $$x$$, with some mathematics:
    
    
     \$\$ x^{2} + y^{2} = z^{2} \$\$
    
    
     A new paragraph after a block of mathematics ...
    
  3. Then:

    1. EITHER convert to markdown as usual but then insert link to modified latexmathml.js in your html documents (or if using original latexmathml just convert $$ to $ everywhere)
    2. OR convert markdown to latex using my script and then use latex2html

Web-Based Annotation

December 19th, 2006

We intend to add annotation/commentarysupport to the open shakespeare web demo either in this release or next. As a first step I’ve been looking to see what (open-source) web-based annotation systems are already out there. Below is a list of what I’ve been able to find so far (if you know of more please post a comment). After examining several of these in some detail the one we’re going to try our properly is marginalia (if you’re interested our current efforts to do this including writing a python wsgi annotation service backend can be found here in the subversion repository).

  1. stet: javascript annotation system used for gpl v3 comments system

  2. commentary: javascript based wsgi middleware developed by ian bicking

    • http://pythonpaste.org/commentary/
    • Rather hacked together (apparently he coded it in a week). Had problems getting it working locally and no documentation to help in adaptation. Seems to be unmaintained (demo site is currently down) which is perhaps not surprising given how many other projects Ian has on the go.
    • One nice feature is that you don’t seem to have to mess with the underlying web pages you want to add comments to (this only works if you are sitting on top of another wsgi application)
  3. marginalia: javascript library and spec for adding web annotation to pages

  4. annotea: W3C project based on RDF

    • http://www.w3.org/2001/Annotea/
    • Been around a long time and now seems to be inactive
    • Server and client support rather lacking. No simple interface based on, e.g., javascript — you have to write a special client yourself — which is a major drawback
    • That said the protocol is well-documented and so writing a client (or a server) shouldn’t be that hard (other than having to mess around with rdf in javascript …)
    • The Schema seems reasonable
    • xpointer based which according to the marginalia site is a problem

UPDATE (2008-06): a new version is available (v1.2): http://www.rufuspollock.org/2008/06/23/markdown2latex-mkdn2latex-12/

Over the last year I’ve written quite a few papers using markdown plus asciimathml. While this is great for web publication (and editing) and gives me lots of styling freedom via css it doesn’t produce output that’s as nice as that produced by latex especially in paginated form (also latex mathematics support is also currently better than that of obtained from asciimathml or latexmathml).

Unable to find any python code that would do what I want I played around for a couple of hours with the python-markdown script until I got something functional. After a few weeks of use which has allowed me to iron out the bugs and making several improvements I feel the script is now ready for public release. Hope people find it useful.

Download

Get it from: http://project.knowledgeforge.net/okftext/svn/trunk/python/mkdn2latex.py

(You can also it check it out using subversion from the same url if you want)

For the script to function you will also need to install the python-markdown module v1.5 (make sure you install it under the name markdown.py).

Usage

The following will print the latex output to the console (standard out):

 $ mkdn2latex.py path-to-markdown-file.mkd

To convert a markdown file straight to a latex output file do:

 $ mkdn2latex.py path-to-markdwon-file.mkd > path-to-output-file.ltx

NB: As provided the script expects mathematics in your markdown file to be delimited with ‘$\$’ (this should be dollar dollar — the slash is there to stop this being rendered as maths in the blog) as opposed to the standard asciimathml delimiters of ‘`’ or ‘$’.