The lead article of Prospect Magazine’s February issue is a piece by by James Crabtree and Tom Chatfield entitled “Mashing the State”. It’s an in-depth look at the recent launch of data.gov.uk and its place in the wider context of government policy in relation to information — as well as information’s relation to governance (that “mashing” of the state …).

Where Does My Money Go gets a mention as does the “Cambridge” paper on pricing models at trading funds.

This Wednesday (27th of January) at 1pm I’m giving one of Cambridge University Library’s regular lunch-time talks on Openness and Libraries. Attendance is free and anyone can come along!

Update (28th Jan): talk is done and slides are now up.

Blurb

Over the past few years, open licensing (http://www.opendefinition.org/) has facilitated the explosive growth of a ‘knowledge commons’. To give a few prominent examples: Open Access journals, Open Educational Resources and Open Data in scientific research have all been enabled by licenses which permit material to be freely re-used and re-distributed. This outpouring of support for openness has led to an incredible rise in community-led development and innovative uses.

Bibliographic records are a key part of our shared cultural heritage and essential to anyone working with cultural materials (books, music, films etc). Opening up those records for access and re-use offer a variety of benefits.

First, it would allow libraries to share records more efficiently and improve quality more rapidly through better, easier feedback. Second, easier access to catalogue data would spur development of the multifarious services, technologies and research that use that data, including, for example, search engines, book or music websites, researchers working on information production, journalists writing on orphan works, as well as many other areas we cannot even imagine in advance.

With a growing number of Government agencies and public institutions making data open, is is now time for the library community to do likewise?

Apparently, on the 11th of December 2009, Argentina extended copyright term in recordings from 50 to 70 years (see e.g. here, here and here).

Instead of the real reasons for extension — propping up the profits of a handful of multinational record labels and their shareholders (at the expense of everyone else) — the usual disingenuous justifications were once again being trotted out by music industry representatives.

First up was (all quotes from the billboard article):

The investment argument

“I would like to thank all those who supported this new law which will benefit the music community in Argentina,” tango master Leopoldo Federico, president of AADI, said in a statement. “It will improve incentives to invest in future recordings and also helps older performers who had faced losing their rights just when they need them the most.”

John Kennedy, chairman and chief executive of IFPI, also welcomed the legislation. “I am delighted that Argentina has strengthened the rights of performers and producers by extending the term of protection,” he said in a statement. “Argentina has a strong musical heritage and this reform means that producers will have a greater incentive to invest in the next generation of local talent.

But wait a moment: “producers” are already getting 50 years of monopoly protection. How much extra incentive are those 20 extra years going to provide?

Let’s do some simple calculations.

First off remember this is about incentives, which means it is about expected payoffs at the point of investment, i.e. when the recording is created. As such we should be dealing with “present value” figures, i.e. total revenue in “today’s terms”.

To work out the the effect of an extension then we need an idea for a) what future sales look like relative today (the cultural decay rate) and b) a way of putting future revenue in today’s term (the discount rate). The industry’s own analysis (commissioned for the Gowers review in the UK) used a nominal discount rate of 12.3% (pre-tax) and cultural decay rates of 3-20% (in nominal terms it appears). Let’s be generous and take the lowest possible cultural decay rate of 3%. Combined with the 12.3% discount rate this means that, on average, revenue is dropping at a substantial 14.3%!

Running this through a bit of basic maths (and I mean really basic — code inline below) we find that the 20 year extension will deliver a tiny 0.08% increase in revenues. Even halving the nominal discount rate to a very low figure like 6% only pushes up the revenue gain to just over 1% (1.1%). For those who like things visually here’s a picture:

revenue_impact

Aside: Of course there will be a lot of variation from the average — note that the relevant variation is not between hits and duds (as these may experience exactly the same decay!) but between records which go on selling at a reasonably steady rate and those which fade away fairly quickly. However, an “investor”, such as a record label, tends to “invest” in a whole “portfolio” of records precisely in order to reduce this “risky” variability (and in any case greater risk implies a higher discount rate assuming the investor is risk averse). As such the average revenue increase is precisely what an “investor” will use when making decisions such as how many recordings to fund.

Next up was:

The pension for performers argument

“I would like to thank all those who supported this new law which will benefit the music community in Argentina,” tango master Leopoldo Federico, president of AADI, said in a statement. “It will improve incentives to invest in future recordings and also helps older performers who had faced losing their rights just when they need them the most.

But life expectancy in Argentina is 75 years — and is probably shorter for most performers who are old today. So, unless a performer is especially prolific in their teens, 50 years of copyright monopoly is already enough to cover them in their old(er) age.

And anyway haven’t performers heard about pensions or saving for the future — everyone else has. I don’t expect the plumber I pay today to fix by sink to come back in 50 years asking for additional payment for a pension plan! Instead I expect the plumber to save some of the income received today to use in retirement.

Moreover, as the calculations above should make clear, copyright income 50+ years in the future from recordings today is likely (on average) to be tiny (0.08% of the revenue received during the first 50 years!). As such there is no way the average performer could rely on income from a 20 year term extension 50 years in the future to support them in their old age. Just like everyone else they will need to save some of the income during that first 50 years.

Aside: in fact it is is more like 10 years or even fix years, as for most recordings, the vast majority of the revenue they will ever generate will come in the first 5 or 10 years after release.

Last up we had:

The cultural argument

Javier Delupí, CAPIF’s executive director, added: “This new law is good news for Argentine culture. It promotes the creation of new music and safeguards the rights of performers and producers both here and abroad.”

But:

  • The investment argument is completely invalid (see above) and hence there won’t be any “promoting the creation of new music”.
  • In fact, to the contrary, the extension will impede the creation of new works by reducing the public domain on which all creators can and do build.
  • Moreover, an extension transfers money to (older and already successful) performers away from younger and less well-known ones.
  • Depending on how comparison of terms is implemented an extension actually harms the balance of payments of the enacting country (e.g. the UK looses out from a term extension in recordings)

So, no, term extensions aren’t good for (Argentine) culture — though they may be good for CAPIF (Representando a la Industria Argentina de la Música).

Conclusion

It’s time we start calling a spade a spade: this term extension is a simple, and highly inefficient, subsidy to the major record labels plus, perhaps, a few, already highly successful, performers, which is paid for by the general populace.

If it can command widespread assent in that form, then, fine, let it pass! But I sincerely doubt the likelihood of this occurrence. If this is so, then the passage of such bills, is nothing more or less than a straightforward “robbery upon the public” — in the 150 year-old words of Henry Warburton, radical opponent of the UK’s term extension of the 1840s.

Colophon

Here’s the python script used for the revenue calculations above, together with the code to generate the figure.


#!/usr/bin/env python
def extra_revenue(term, extension, decay, irate):
    dfactor = 1/(1+decay+irate)
    def geometric(df, NN):
        return (1-df**(NN+1))/(1-df)
    total = geometric(dfactor, term)
    textension = dfactor**term * geometric(dfactor, extension)
    increase = textension/total
    print('Term, Extension, decay, irate: %s %s %s %s' % (term, extension,
        decay, irate))
    print('Percentage increase: %s' % (100*increase))

extra_revenue(50, 20, 0.03, 0.123)
extra_revenue(50, 20, 0.05, 0.123)
extra_revenue(50, 20, 0.03, 0.06)
extra_revenue(50, 20, 0.04, 0.06)

import math
def visualize():
    import matplotlib.pyplot as pyplot
    # normalize main square to 10x10 = 100
    pyplot.bar(0, 10, width=10, fc='red', alpha=0.6)
    edge = math.sqrt(0.08)
    pyplot.bar(14, edge, width=edge, bottom=5, align='center', fc='blue', alpha=0.6)

    pyplot.bar(14, 1, width=1, bottom=1, align='center', fc='blue', alpha=0.6)

    pyplot.figtext(0.15, 0.7, 'Present Value of Revenue\nUnder Existing\n50y Term', multialignment='center', va='top')
    pyplot.figtext(0.65, 0.7, 'PV of Extra Revenue\nfrom 20y Extension',
            multialignment='center', va='top')
    pyplot.figtext(0.7, 0.4, '1% of Existing\n Revenue',
            multialignment='center', va='top')

    # hack to get rid of axes ...
    ax = pyplot.gca()
    ax.set_frame_on(False)
    pyplot.yticks([],[])
    pyplot.xticks([],[])

    fig = pyplot.figure(1)
    fig.set_size_inches(5, 3)
    pyplot.savefig('revenue_impact.png')

visualize()
print('Saved image to disk')

Deliverance is a great library that lets you easily re-theme external websites on the fly. Designed as WSGI middleware, it can be easily combined with some proxying to integrate a bunch of websites together

You can use deliverance plus proxying out-of-the-box using the deliverance-proxy command. However, I was interested in using Deliverance as middleware from code. This turned out to be none too trivial to do — all the examples on the internet seemed to focus on using deliverance-proxy or using it in an ini file.

After much wrestling, most notably with odd issues with gzipped (deflated) content I got it working and you can find a demo implementation (see demo.py and README.txt) here:

http://rufuspollock.org/code/deliverance/

I should also mention the following sources which were all of help in my quest:

Attended an interesting talk today: “Historical Banking Crises and the Rules of the Game” by Professor Charles Calomiris, Columbia Business School. Sporadic notes below. See also this Weaving History thread on Financial Crises.

Notes

  • One crisis with 20 different explanations. Need to sort these out a little.
  • If banks are uninsured then in a recession banks cut their supply of loans
    • Banks are facing losses, need to bulk up their balance sheet and can do it either by raising equity or cutting supply of loans. Former is hard so do the latter.
  • Crisis aren’t just inherent to human nature or capitalism. “Crisis propensity reflects politically determined rules of the banking game that are conducive to crises:”
    1. industry setup that determines exposure of banks to risk
    2. absence of decent (effective and incentive compatible) central-banking (NB: 2 isn’t a big problem w/o 1)
    3. subsidization of risk by govt policies
  • Panic = moments of severe sudden withdrawal that threatened the system. Observable variable: collective action by NY clearing banks
    • In US (19th and early 20th c.): 1857, 1873, 1877, 1893, 1907 [ed: missing at least 2 and may have got wrong I think]
    • All of 6 crises in US post civil war were all preceded by 50% increase in liabilities and 7% drop in stock market
    • Britain: 1825, 1836, 1847, 1857, 1866 then none for over a century
  • Solvency crisis: -ve net worth of failed bank > 1% of GDP
    • 140 examples since 1978
    • Rare in past: 4 in 1873-1913
    • Australia: 1893 (10%)
    • Argentina: 1890 (10%)
    • Norway: 1900 (3%)
    • Italy: 1893 (1%)
  • Literature has converged in last 20 years to agree that safety-net provision on balance increases instability (rather than reducing it)
  • Crucial reform in 1858 in UK following 1857 crisis. BoE would no longer intervene in bills market. In 1866 made good on this promise when largest bill discounter went bust (Overend and Gurney)
  • Crisis origins:
    • Loose money: CBs, flat yield curve … (but note not enough for a crisis on own)
    • Housing subsidies delivered by leverage. F&F have $1.6 trillion out of $3 trillion total subprime. $350 billion cost on F&F alone.
    • Huge buy-side agency problems
      • Lots of buy-side people buying poor quality material for clients facility by big race-to-the-bottom at ratings agency
    • Prudential regulation failure
  • Everyone smart knew there was a subprime crisis in mid-2006.
  • Long-term regulatory reforms
    • Micro-prudential reform: focus on measurement of risk
    • Credit rating agency reform
    • Resolution policy/TBTF Problems

From Laslett ‘Phillipe Ariès and “La Famille”‘ p.83 (quoted in Eisenstein, p.131):

The actual reality, the tangible quality of community life in earlier towns or villages … is puzzling … and only too susceptible to sentimentalisation. People seem to want to believe that there was a time when every one belonged to an active, supportive local society, providing a palpable framework for everyday life. But we find that the phenomenon itself and its passing — if that is what, in fact happened– perpetually elude our grasp.

Bright Star

November 15th, 2009

8/10. Beautiful and moving.

I’m one of the co-organizers of a workshop on Public Domain Calculators workshop taking place next week, on the 10th and 11th of November, at Emmanuel College, University of Cambridge.

Hosted by the Open Knowledge Foundation in association with the Centre for Intellectual Property and Information Law at the University of Cambridge, it’s a meeting of European experts on copyright and the digital public domain taking place as part of the Communia project.

The purpose of the workshop is to produce materials such as legal flow charts and public domain “algorithms” which will help with the representation of different national copyright laws and the determination of public domain status.

Details of the meeting are as follows:

Background

There is often a tendency to talk of ‘the public domain’ and of works falling out of copyright and ‘into the public domain’ – as though there is a single set of works which are out of copyright all over the world. In fact, of course, there are different national laws about the nature and duration of copyright in different types of works – and hence what is in the public domain is different in different countries.

Efforts are currently underway to build a series of public domain calculators – which will help to determine whether or not a given work is in copyright in a given jurisdiction. At the time of writing groups and individuals in more than 17 jurisdictions are assisting in this effort.

Continues the series of post related to analyzing catalogue data, here are some stats on author “significance” as measured by the number of book entries (’items’) for that author in the Cambridge University Library catalogue from 1400-1960 (there being 1m+ such entries).

I’ve termed this measure “significance” (with intentional quotes) as it co-mingles a variety of factors:

  • Prolificness — how many distinct works an author produced (since usually each work will get an item)
  • Popularity — this influences how many times the same work gets reissued as a new ‘item’ and the library decision to keep the item
  • Merit — as for popularity

The following table shows the top 50 authors by “significance”. Some of the authors aren’t real people but entities such as “Great Britain. Parliament” and for our purposes can be ignored. What’s most striking to me is how closely the listing correlates with the standard literary canon. Other features of note:

  • Shakespeare is number 1 (2)
  • Classics (latin/greek) authors are well-represented with Cicero at number 2 (4), Horace at 5 (9) followed Homer, Euripides, Ovid, Plato, Aeschylus, Xenophon, Sophocles, Aristophanes and Euclid.
  • Surprise entries (from a contemporary perspective): Hannah More, Oliver Goldsmith, Gilbert Burnet (perhaps accounted by his prolificity).
  • Also surprising is limited entries from 19th century UK with only Scott (26), Dickens (28) and Byron (41)
Here’s
RankNo. of ItemsName
13112Great Britain. Parliament.
21154Shakespeare, William
31076Church of England.
4973Cicero, Marcus Tullius
5825Great Britain.
6766Catholic Church.
7721Erasmus, Desiderius
8654Defoe, Daniel
9620Horace
10599Aristotle
11547Voltaire
12539Virgil
13527Swift, Jonathan
14520Goethe, Johann Wolfgang Von
15486Rousseau, Jean-Jacques
16479Homer
17444Milton, John
18388Sterne, Laurence
19387England and Wales. Sovereign (1660-1685 : Charles II)
20386Euripides
21372Ovid
22358Goldsmith, Oliver
23358Plato
24351Wang
25349Alighieri, Dante
26338Scott, Walter (Sir)
27326More, Hannah
28322Dickens, Charles
29315Aeschylus
30304Burnet, Gilbert
31302Luther, Martin
32295Dryden, John
33290Xenophon
34280Sophocles
35262Pope, Alexander
36259Fielding, Henry
37258Li
38250Calvin, Jean
39248Zhang
40247Aristophanes
41247Byron, George Gordon Byron (Baron)
42247Bacon, Francis
4324have 7Chen
44245Terence
45241Euclid
46235Augustine (Saint, Bishop of Hippo.)
47232Burke, Edmund
48223Johnson, Samuel
49222Bunyan, John
50222De la Mare, Walter

Top 50 authors based on CUL Catalogue 1400-1960

The other thing we could look at is the overall distribution of titles per author (and how it varies with rank — a classic “is it a power law” question). Below are the histogram (NB log scale for counts) together with a plot of rank against count (which equates, v. crudely, to a transposed plot of the tail of the histogram …). In both cases it looks (!) like a power-law is a reasonable fit given the (approximate) linearity but this should be backed up with a proper K-S test.

culbooks_person-item-hist-logxlogy.png

Histogram of items-per-author distribution (log-log)

culbooks_person-item-by-rank-logxlogy.png

Rank versus no. of items (log-log)

TODO

  • K-S tests
  • Extend data to present day
  • Check against other catalogue data
  • Look at occurrence of people in title names
  • Look at when items appear over time

Colophon

Code to generate table and graphs in the open Public Domain Works repository, specifically method ‘person_work_and_item_counts’ in this file: http://knowledgeforge.net/pdw/hg/file/tip/contrib/stats.py

Open Notebook Social Science

October 22nd, 2009

The other day I posted up some work-in-progress on the subject of patterns of knowledge production.

That material is still in a fairly preliminary state. However, my decision to release it it in this form was a conscious decision and part of an ongoing attempt on my part to practice a more open “release early, release often” approach to research.

In doing this I’m drawing direct inspiration from the open source and open notebook (science) communities and seeking to engage in what might be termed open notebook social science!

I think most researchers (including myself) feel a reluctance to put out material that isn’t at a reasonable level of maturity. While there are some good reasons for this, I think the main motivations are less positive, and are primarily to do with fear: be it of criticism or that your ideas are “taken” by others. While such fears can have some basis, it seems to me the benefits of an open approach — in terms of visibility, dissemination, and potential for collaboration — significantly outweigh any of the associated risks.

Over the last year, I’ve already been making some effort to move in this direction but from this point on I’m aiming to do this more thoroughly and methodically. A first step in this will be to put all the “patterns” and data online.