Distributed versioning systems (VCMs) have now matured to the point that I’ve been planning to switch from subversion for quite a while — at least for own personal repositories where there are no coordination issues. Having chosen mercurial (hg) as my DVCM of choice the next step was to actually convert. While there is quite a bit of documentation on this topic available online I didn’t always find these had the necessary info. Combined with my experience of several ’snags’ along the way I thought it worth documenting my experience in case it proves useful to others.

I’d waited until hg 0.9.5 was available on my distro precisely because I wanted to use the hg convert functionality (alteratives such as tailor looked to have difficulties and though it turned out I could have used hgsvn without problem my original impression of it had been it was oriented for integration of hg and svn rather than straight conversion).

Before I docment the steps it is important to get clear one v. important thing about how hg works:

There is no distinction between a working copy and a repository.

In particular, each repository is a working copy and vice versa. The actual repo is stored inside the working copy at its root in a .hg directory. When you ‘checkout’ (svn terminology) you do so simply by ‘cloning’ an existing repository (or if just want a limited set of changes — e.g. those since you lasted ‘updated’ (svn terminology) you can do a ‘pull’). In fact you could even just make a plain copy that .hg directory and send it to someone — though obviously this might not work so well if you are moving between 2 OSs with different filesystems.

Anyway, the main point to take from this is that the result of an hg convert will simply be a new directory with all the files (the working copy) plus a .hg directory in that directory (the repo).

To convert all you do is::

$ hg convert <svn-repo-or-co> <some-new-directory>

The devil however is in the detail:

  1. svn-repo-or-co can be the uri of a subversion repo or the path of a svn checkout. Where a checkout hg convert will just work out the source repo and pull from there
    1. Note however hg convert will not move across working copy files themselves. The obvious solution to this is to do the convert and then just move the .hg file across into your svn checkout and delete all the .svn directories (or vice-versa)
  2. some-new-directory: this is where the new hg repo/working copy with end up.
  3. After doing hg convert rather surprisingly all of the files in the new hg repository will be listed as ‘?’ (not tracked) when you do a hg status. To solve this just do a hg update
  4. To speed up conversions it is often worth getting a local copy of the subversion repo (to save pulling lots of stuff over the network connection). To do this either use svnsync or just dump the remote repo and load into a local one (if converting from a working copy you’ll then just need to do a svn switch --relocate
  5. My repository did not have a branches/tags/trunk layout (instead it has multiples subprojects …). This led to weird errors involving files and directories at the root of the repository which looked like: ‘hg convert abort: path contains illegal component’. I solved this by using the --filemap option to hg convert and putting explicit renames of the form: /root-path-1 root-path-1 in that file.
  6. What do you for all the other working copies once you have converted the repo/your working copy? This is now simple:
    1. Clone your hg repo to each of the machines with a working copy.
      • For this purpose you will probably want to make your original hg repo available over the Internet using either ssh or http protocols (for details see mercurial docs).
        1. Copy over the svn working files into that new hg repo

I recently took delivery of a Novatech X40r system (Novatech are one of the few suppliers who allow me to get a machine without Windows). The most recent version of Ubuntu (Gutsy) installed without any issues — though I couldn’t quite seem to get the display resolution to match the screen resolution. Next step was to plug in my external monitor: nothing happened. This post quickly details how I got this fixed. Most of it is derived from an excellent post in the ubuntu forums [1] and this freedesktop bug post [2]. It is important to note that this may be specific to the graphic card being used: Intel GMA X3100 integrated graphics.

[1] http://ubuntuforums.org/showpost.php?p=4003194&postcount=584
[2] http://bugs.freedesktop.org/show_bug.cgi?id=12229

Instructions

For most part just follow the excellent instructions in [1]. Details of where these needed to be modded can be found below:

STEP 1: my output from xrandr -q in step 1 was


$ xrandr -q
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 1280 x 1280
VGA disconnected (normal left inverted right)
LVDS connected 1280x800+0+0 (normal left inverted right) 304mm x 190mm
   1280x800       59.9*+   60.0  
   1280x768       60.0  
   1024x768       60.0  
   800x600        60.3  
   640x480        59.9  
TV connected 1024x768+0+0 (normal left inverted right) 0mm x 0mm
   1024x768       30.0* 
   800x600        30.0  
   848x480        30.0  
   640x480        30.0  

As one can see there is this spurious TV entry. For the time being ignore this and proceed through the next steps.

STEP 3: In step 3 nothing happened immediately and on manual activation I received an error:


$ xrandr --output VGA --auto
xrandr: cannot find crtc for output VGA

My xrandr -q output was:


$ xrandr -q
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 1280 x 1280
VGA connected (normal left inverted right)
   1280x1024      59.9  
   1024x768       59.9  
   800x600        59.9     56.2  
   640x480        60.0  
LVDS connected 1280x800+0+0 (normal left inverted right) 304mm x 190mm
   1280x800       59.9*+   60.0  
   1280x768       60.0  
   1024x768       60.0  
   800x600        60.3  
   640x480        59.9  
TV connected 1024x768+0+0 (normal left inverted right) 0mm x 0mm
   1024x768       30.0* 
   800x600        30.0  
   848x480        30.0  
   640x480        30.0  

As one can see the new monitor is detected. After some Googling I came across [2]. This suggested there might be some conflict between the spurious TV entry and new monitor (essentially it appears the auto-detection code on some newish chipsets generates false-positives for the existence of a TV-out and this conflicts with activating additional monitors). I therefore did:

 $ xrandr --output TV --off

Having done this activation of the new monitor worked:

 $ xrandr --output VGA --auto

Even better the incorrect match of the display resolution to the screen resolution on the laptop went away suggesting that the existence of the TV item was also affecting the LVDS display.

I record briefly my experience resolving this issue in case it helps others. As background I note that I use svk to allow local commit and replay for some of the subversion repos I use and over the last week I’d started encountering problems when trying to svk sync on one of these receiving the following error message:

Bad URL passed to RA layer: Malformed URL for repository

The solution to this is the following patch provided by Peter Werner to the svk-devel list a few days ago:

-------------- next part --------------
--- SVN-Mirror-0.73.orig/lib/SVN/Mirror/Ra.pm 2007-03-19 23:59:12.000000000 +0100
+++ SVN-Mirror-0.73/lib/SVN/Mirror/Ra.pm  2007-10-07 08:37:36.000000000 +0200
@@ -168,6 +168,9 @@
     $self->{config} ||= SVN::Core::config_get_config(undef, $self->{pool});
     $self->{auth} ||= $self->_new_auth;

+    # escape URI (% is already escaped)
+    $arg{'url'} =~ s/([^-_:.%\/a-zA-Z0-9])/sprintf("%%%02X", ord($1))/eg if defined $arg{'url'};
+
     SVN::Ra->new( url => $self->{rsource},
      auth => $self->{auth},
      config => $self->{config},

In addition to this solution below I report the process by which I discovered it. I do this as it provides an interesting case study of the way that open source communities work, and particularly how ‘user-driven bug-fixing’ happens.

  1. Searching on the web turned up a variety of earlier reports [1][2][3] of this issue which it seemed related to having a spaces in svn url names (see [1.1] and [2] in particular). This seemed plausible as a source of the error as it occurred after someone had added a directory with spaces in it to the repository (a very rare occurrence).
  2. This issue did not seem to occur for all users and CLK (the maintainer) suggested upgrading SVN::Mirror to 0.73. [2.1]
  3. This I did but the bug was still there (as other users had noted [2.2]) however the source now seemed to be pinpointed as being in the SVN::Mirror perl module. Unfortunately I’m not a perl hacker …
  4. Finally a hand search of the svk lists turned up a post from less than a week ago [4] (obviously too recent for Google to have picked up yet as I had earlier done a specific search for the error name over the svk lists …). In addition to reporting the problem this mail provided a 2 liner patch to a specific perl module. I applied this patch, tried svk sync and hey presto! the bug was gone.

The issue progressed from an unconfirmed one whose aetiology was unclear [1], to a confirmed one whose cause was fairly well known [2] (though not its source in code), solutions were suggested and tested by users [2.1, 2.2], the issue remained unresolved for several more months with the fix eventually provided by an independent user to the list [4].

It is also especially noteworthy that much of this tracking down was only possible because the software involved was open enabling users to poke around to see what was wrong. For example, tying the bug to spaces in the underlying repository url resulted from the original reporter of the issue hand-modifying a svn source file so as to make the error message more verbose [1.1] — something which is clearly only possible if the code is open.

Counting Words in a Latex File

August 24th, 2007

Much of this was inspired by this blog post. Having tested on my own set of files I would suggest that these methods could be ranked in order of accuracy as:

  1. TexCount.pl
  2. untex + wc
  3. wc
  4. pdf file

wc

$ wc -w file.tex

This is very simple but is pretty inaccurate since wc has no awareness of tex commands or mathematics (which results in overcounting) and does not expand things like bibliographies (which results in undercounting). Overall the result is likely to be a substantial overcount.

Look at the resulting pdf file.

$ pdftotext file.pdf - | egrep -E '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc

More sophisticated but in my experience results in grossly overestimated wordcounts due to inability to deal with mathematics and issues with pdftotext (lots of words get broken up that shouldn’t be).

TexCount.pl

Get it from: http://folk.uio.no/einarro/Comp/texwordcount.html

This seemed to be pretty good.

untex + wc

$ untex file.tex | wc

Again likely to overcount for mathematics and fairly limited removal of tex commands (though may undercount due to omission of citation/biblio type stuff).

This is a simple hack to enable you to start OpenOffice and, more importantly, open documents with it from the command line. I’ve got the standard X port of OpenOffice 2.0 installed, so if you have something different you may need to change the path to soffice given below (to find soffice on your machine try from the command line $ locate soffice):

First let’s make the script that starts openoffice available in a convenient way e.g. by symlinking into ~/bin or /usr/bin:

 $ cd ~/bin
 $ ln -s /Applications/OpenOffice.org\ 2.0.app/Contents/openoffice.org/program/soffice ./

Now you can do stuff like:

 $ soffice -help

You’ll see there are different switches which allow you to start a text document, a spreadsheet etc. One annoyance to note is that if you get soffice to load a file by doing:

$ soffice [options] ${filename}

The application it will use (writer, calc, math …) will depend solely on the extension of the filename and will ignore any options you give it. So e.g. if you do:

$ soffice -writer some.csv

Then this will load in calc even though the -writer option was given. For more details (on this very old bug) see:

http://www.openoffice.org/servlets/ReadMsg?list=allbugs&msgNo=94354

Fortunately this isn’t too much of a problem since the extension mapping is pretty reasonable.

import readline
readline.write_history_file('my_history.py')

Have repository {start-repo} and want to merge into {old-repo} at {some-dir-name}

# if you just wanted trunk replace with this
# svndump {start-repo} | svndumpfilter include trunk 
svnadmin dump {start-repo} > my-dump
# copy file to main server and change to web-user
svnadmin load --parent-dir {some-dir-name} {old-repo} < my-dump

Just like this guy when trying to do $ port selfupdate I’d get errors like:

Selfupdate failed: couldn't open
".../var/db/dports/sources/rsync.rsync.opendarwin.org_dpupdate1/base/dp_version":
no such file or directory

The problem is that my macports version is very old and after an rsync dp_version is now in base/config rather than just base. Furthermore because the rsync happens before you check dp_config putting in a symlink or just copying the file over won’t work as it gets deleted again before it is checked. The solution I found was to edit /Library/Tcl/darwinports1.0/darwinports.tcl and find this bit of code:

# get new darwinports version and write the old version back
set fd [open [file join $dp_base_path dp_version] r]
gets $fd dp_version_new
close $fd
ui_msg "New DarwinPorts base version $dp_version_new"

Then change the first line after the comment so that it reads (i.e. insert ‘config’):

# get new darwinports version and write the old version back
set fd [open [file join $dp_base_path config dp_version] r]
gets $fd dp_version_new
close $fd
ui_msg "New DarwinPorts base version $dp_version_new"

And hey presto! selfupdate now works.

UPDATE (2008-06): a new version is available (v1.2): http://www.rufuspollock.org/2008/06/23/markdown2latex-mkdn2latex-12/

Over the last year I’ve written quite a few papers using markdown plus asciimathml. While this is great for web publication (and editing) and gives me lots of styling freedom via css it doesn’t produce output that’s as nice as that produced by latex especially in paginated form (also latex mathematics support is also currently better than that of obtained from asciimathml or latexmathml).

Unable to find any python code that would do what I want I played around for a couple of hours with the python-markdown script until I got something functional. After a few weeks of use which has allowed me to iron out the bugs and making several improvements I feel the script is now ready for public release. Hope people find it useful.

Download

Get it from: http://project.knowledgeforge.net/okftext/svn/trunk/python/mkdn2latex.py

(You can also it check it out using subversion from the same url if you want)

For the script to function you will also need to install the python-markdown module v1.5 (make sure you install it under the name markdown.py).

Usage

The following will print the latex output to the console (standard out):

 $ mkdn2latex.py path-to-markdown-file.mkd

To convert a markdown file straight to a latex output file do:

 $ mkdn2latex.py path-to-markdwon-file.mkd > path-to-output-file.ltx

NB: As provided the script expects mathematics in your markdown file to be delimited with ‘$\$’ (this should be dollar dollar — the slash is there to stop this being rendered as maths in the blog) as opposed to the standard asciimathml delimiters of ‘`’ or ‘$’.

Having looked around for a while without success for something that would spit out csv files as ascii tables I decided to hack something together. The result is a small python script [csv2ascii.py][]. It is currently fairly crude, for example it just truncates cell text which is too long, but I hope I’ll have some more time to improve it soon.

Example

Suppose you had the following in a file called example.csv:

"YEAR","PH","RPH","RPH_1","LN_RPH","LN_RPH_1","HH","LN_HH"
1971,7.8523,43.9168,42.9594,3.7822,3.7602,16185,9.691843   
1972,10.5047,55.1134,43.9168370988587,4.0093,3.7822,16397,9.704855

Running:

 $ ./csv2ascii.py example.csv

Would result in:

+------+------+------+------+------+------+------+------+
| YEAR |  PH  | RPH  |RPH_1 |LN_RPH|LN_RPH|  HH  |LN_HH |
+------+------+------+------+------+------+------+------+
| 1971 |7.8523|43.916|42.959|3.7822|3.7602|16185 |9.6918|
+------+------+------+------+------+------+------+------+
| 1972 |10.504|55.113|43.916|4.0093|3.7822|16397 |9.7048|
+------+------+------+------+------+------+------+------+