The Size of the Public Domain
June 12th, 2009
This post continues the work begun in this earlier post on “Estimating Information Production and the Size of the Public Domain”.
Having already obtained estimates of the number of items (publications) produced each year based on library catalogue data our next step is to convert this into an estimate of the “size” of the public domain. (NB: as already discussed, “size” could mean several different things. Here, at least to start with, we’re going to take the simplest and crudest approach and equate size with number of publications/items.)
The natural, and most obvious, approach here is to go through our 1 million+ items and compute their public domain status (as discussed in this earlier post). Unfortunately, as detailed there, this is problematic because we often have insufficient information in library catalogues with which to compute PD status with certainty — in particular, author death dates are frequently absent. Thus, it will be necessary to fall back on some approximate method.
For example, we can use base PD status on simple publication dates: if a book was published, say, 140 years ago it is very likely it is in the public domain — for it to be in copyright its author must have lived more than 70 years after the book came out (remember copyright lasts for life plus 70 years in the EU)! Conversely, any publication less than 70 years old is almost certainly not in the public domain. For periods in between we can assume some proportion of publications are PD starting close to zero for more recent items and rising towards one for older ones. A calculation along those lines is provided in the following table:
| Start | End | Items | % PD | Number PD |
|---|---|---|---|---|
| 1400 | 1870 | 389291 | 100 | 389291 |
| 1870 | 1880 | 50564 | 95 | 48035 |
| 1880 | 1890 | 66857 | 90 | 60171 |
| 1890 | 1900 | 66883 | 80 | 53506 |
| 1900 | 1910 | 70360 | 50 | 35180 |
| 1910 | 1920 | 60489 | 30 | 18146 |
| 1920 | 1930 | 78670 | 10 | 7867 |
| 1930 | 1940 | 90576 | 5 | 4528 |
| Total | 873690 | 0.71 | 616724 |
Number of UK Public Domain Publications (Based on Cambridge University Library Catalogue Data)
So, based on the assumptions regarding PD proportions given in the table, there are somewhat over 600 thousand PD books according to the holdings of Cambridge University Library (of which just over half, approx 390k are from before 1870). The British Library dataset is approx 4x as big as Cambridge University Library and the numbers scale up roughly proportionately giving a total of over 2.4 million items.
Of course this is a fairly crude approach based purely on publication date and it be improved in a variety of ways, most notably by using the authorial birth date information which is usually present in catalogue data (we can also use death date information where present). This will be the subject of the next post.
Here we’re going to look at using library catalogue data as a source for estimating information production (over time) and the size of the public domain.
Library Catalogues
Cultural institutions, primarily libraries, have long compiled records of the material they hold in the form of catalogues. Furthermore, most countries have had one or more libraries (usually the national library) whose task included an archival component and, hence, whose collections should be relatively comprehensive, at least as regards published material.
The catalogues of those libraries then provide an invaluable resource for charting, in the form of publications, levels of information production over time (subject, of course, to the obvious caveats about coverage and the relationship of general “information production” to publications).
Furthermore, library catalogue entries record (almost) the right sort of information for computing public domain status, in particular a given record usually has a) a publication date b) unambiguously identified author(s) with birth date(s) (though unfortunately not death date). Thus, we can also use this catalogue data to estimate the size of the public domain — size being equated here to the total number of items currently in the public domain.
Results
To illustrate, here are some results based on the catalogue of Cambridge University Library which is one of the UK’s “copyright libraries” (i.e. they have a right to obtain, though not an obligation to hold, one copy of every book published in the UK). This first plot shows the numbers of publications per year (as determined by their publication date) up until 1960 (when the dataset ends) based on the publication date recorded in the catalogue.
A major concern when basing an analysis on these kinds of trends is is that fluctuations over time derive not from changes in underlying production and publication rates but changes in acquisition policies of the library concerned. To check for this, we present a second plot which shows the same information but derived from the British Library’s catalogue. Reassuringly, though there are differences, the basic patterns look remarkably similar.

Number of items (books etc) Per Year in the Cambridge University Library Catalogue (1600-1960).

Number of items (books etc) Per Year in the British Library Catalogue (1600-1960).
What do we learn from these graphs?
- In total there were over a million “Items” in this dataset (and parsing, cleaning, loading and analyzing this data took on the order of days — while the preparation work to develop and perfect these algorithms took weeks if not months)
- The main trend is a fairly consistent, and approximately exponential, increase in the number of publications (items) per year. At the start of our time period in 1600 we have around 400 items a year in the catalogue while by 1960 the number is over 16000.
- This is a forty-fold increase and corresponds to an annual growth rate of approx 0.8%. Assuming “growth” began only around the time of the industrial revolution (~ 1750) when output was around 1000 (10-year moving average) gives a fairly similar growth rate of around 0.89%.
- There are some fairly noticeable fluctuations around this basic trend:
- There appears to be a burst in publications in the decade or decade and a half before 1800. One can conjecture several, more or less intriguing, reasons for this: the cultural impact of the French revolution (esp. on radicalism), the effect of loosening copyright laws after Donaldson v. Beckett, etc. However, without substantial additional work, for example to examine the content of the publications in that period these must remain little more than conjectures.
- The two world wars appear dramatically in our dataset as sharp dips: the pre-1914 level of around 7k+ falls by over a third during the war to around 4.5k and then rises rapidly again to reach, and pass, 7k per year in the early 20s. Similarly, the late 1930s level of around 9.5k per year drops sharply upon the outbreak of war reaching a low of 5350 in 1942 (a drop of 45%), and then rebounding rapidly at the war’s end: from 5.9k in 1945 to 8k in 1946, 9k in 1947 and 11k in 1948!
To do next (but in separate entries — this post is already rather long!):
- Estimates for the the size of the public domain: how many of those catalogue items are in the public domain
- Distinguishing Publications (”Items”) from “Works” — i.e. production of new material versus the reissuance of old (see previous post for more on this).
Colophon: Background to this Research
I’m working on a EU funded project on the Public Domain in Europe, with particular focus on the size and value of the public domain. This involves getting large datasets about cultural material and trying to answer questions like: How many of these items are in the public domain? What’s the difference in price and availability of public domain versus non public domain items?
I’ve also been involved for several years in Public Domain Works, a project to create a database of works which were in the public domain.
Colophon: Data and Code
All the code used in parsing, loading and analysis is open and available from the Public Domain Works mercurial repository. Unfortunately, the library catalogue data is not: library catalogue data, at least in the UK, appears to be largely proprietary and the raw data kindly made available to us for the purposes of this research by the British Library and Cambridge University Library was provided only on a strictly confidential basis.
Last Thursday I attended a talk by Frederick Scherer at the [Judge] entitled: “Deregulatory Roots of the Current Financial Crisis”. Below are some sketchy notes.
Notes
Macro story:
- Huge current account deficit for last 10-15 years
- Expansionary Fed policy has permitted this to happen while interest rates are low
- Median real income has not risen since the mid
- Cheap money mean personal savings have dropped consistently: 1970s ~ 7%, 2000s ~ 1%
- Basically overconsumption
Micro story:
- Back in the old days, banking was very dull — three threes story, “One reason I never worked in the financial industry: it was very dull when I got my MBA in 1958″
- S&L story of 1980s: inflation squeeze + Reagan deregulation
- FMs: Fannie Mae, Freddie Mac get more prominent
- Ed: main focus here was on pressure for S&L to find better returns without much mention of the thoughtlessness of Reagan deregulatory approach (deposits still insured but S&L can now invest in anything) and the fraud and waste it engendered — see “Big Money Crime: Fraud and Politics in the Savings and Loan Crisis” by Kitty Calavita, Henry N. Pontell, and Robert Tillman
- In 1920s there were $2 billion of securitized mortgages (securitazation before the 1980s!)
- Market vs. bank finance for mortgages: market more than bank by mid-1980s [ed: I think — graph hard to read]
- To start with: FMs pretty tough when giving mortgages, but with new securitizers and lots of cheap money, standards dropped => moral hazard for issuers [ed: not quite sure why this is moral hazard — securitizers aren’t the ones who should care, it’s the buyers who should care]
- Even if issuers don’t care, buyers of securitized mortgages should care and they depended on ratings agencies (Moodys, S&P etc)
- Unfortunately, ratings agencies had serious conflicts of interest as they were paid to do ratings by firms issuing the securities! Result: ratings weren’t done well
- Worse: people ignored systemic risk in the housing market and therefore made far too low assessment of risk of these securities [ed: ignoring systemic risks implies underestimating correlations — especially for negative changes — between different mortgage types (geographic, owner-type etc). Interesting here to go back and read the quarterly statement from FM in summer 2008 which claims exactly this underestimate.]
- Banks over-leveraged for the classic reason (it raises your profits if things are good — but you can get wiped out if things are bad)
- This made banks very profitable: by mid 2000s financial corporations accounted for 30% of all US corporate profits
- Huge and (unjustified relative to other sectors) wage levels. Fascinating evidence here provide by correlating wage premia to deregulation: fig 6 from Philippson and Reshi shows dramatic association of wage premium (corrected for observable skills) with (de)regulation. Wage premium goes from ~1.6 in 1920s to <1.1 in 1960s and 70s and then back up to 1.6/1.7 in mid 2000s
- Credit default swaps and default insurance: not entirely new but doubled every eyar from 2001 to the present ($919 billion in 2001 to $62.2 trillion in 2007)
- Much of the time CDS issued without any holding of the underlying asset
- There was discussion on regulating CDSes in 1990s (blue-ribbon panel reported in 1998) but due to shenanigans in the house and senate led by Phil Graham (husband of Wendy Graham who was head of Commodity Futures … Board), CDSes were entirely deregulated via act tacked onto Health-Education-Appropriations bill in 2001.
It goes bad:
- Housing bubble breaks in 2007 or even 2006
- Notices of default starts trending upwards in mid 2006
- [ran out of time]
What is to be done:
- Need simple, clear rules
- A regulator cannot monitor everything day-to-day
- Outlaw Credit Default Swaps
- Anyone who issues CDOs must “keep skin in the game”
- Leverage ratios. Perhaps? Hard to regulate.
- Deal with too big to fail by making it hard for “giants to form” and breaking up existing over-large conglomerates
- We need to remember history!
Own Comments
This was an excellent presentation though, as was intended, it was more a summary of existing material than a presentation of anything “new”.
Not sure I was convinced by the “remember history” logic. It is always easy to be wise after the event and say “Oh look how similar this all was to 1929″. However, not only is this unconvincing analytically — it is really hard to fit trends in advance with any precision (every business cycle is different), but before the event there are always plenty of people (and lobbyists) arguing that everything is fine and we shouldn’t interfere. Summary: Awareness of history is all very well but it does not provide anything like the precision to support pre-emptive action. As such it is not really clear what “awareness of history” buys us.
More convincing to me (and one could argue this still has some “awareness of history in it) are actions like the following:
Worry about incentives in general and the principal-agent problem in particular. Try to ensure long-termism and prevent overly short-term and high-powered contracts (which essentially end up looking like an call option).
Since incentives can be hard to regulate directly one may need to work via legislation that affects the general structure of the industry (e.g. Glass-Stegall).
Summary: banking should be a reasonably dull profession with skill-adjusted wage rates similar to other sectors of the economy. If things get too exciting it is an indicator that incentives are out of line and things are likely to go wrong (quite apart from the inefficiency of having all those smart people pricing derivatives rather than doing something else!)
Be cautious regarding financial innovation especially where new products are complex. New products have little “track record” on which to base assessments of their benefits and risks and complexity makes this worse.
In particular, complexity worsens the principal-agent problem for “regulators” both within and outside firms (how can I decide what bonus you deserve if I don’t understand the riskiness and payoff structure of the products you’ve sold?). Valuation of many financial products such as derivatives depend heavily — and subtly — on assumptions regarding the distribution of returns of underlying assets (stocks, bonds etc).
If it is not clear what innovation — and complexity — are buying us we should steer clear, or at least be very cautious. As Scherer pointed out (in response to a question), there is little evidence that the explosion in variety and complexity of financial products since the 80s has actually done anything to make finance more efficient, e.g. by reducing the cost of capital to firms. Of course, it is very difficult to assess the benefits of innovation in any industry, let alone finance, but the basic point that 1940s through 1970s (dull banking) saw as much “growth” in the real economy as the 1980s-2000s (exciting banking) should make us think twice about how much complexity and innovation we need in financial products.
Finally, and on a more theoretical note, I’d also like to have seen more discussion about exactly why standard backward recursion/rational market logic fails here and what implications do the answers have for markets and their regulation. In particular, one would like to know doesn’t knowledge of a bubbles existence in period T lead to its unwinding (and hence by backward recursion to its unwinding in period T-1, and then T-2 etc until the bubble never existed). There are various answers to this in the literature based on things like herding, presence of noise investors, uncertainty about termination, but it would be good to have a summary, especially as regards welfare implications (are bubbles good?), and what policy interventions different theories prescribe.
Filesharing Costs: Dubious Figures Making the Rounds Again
May 29th, 2009
The BBC ran a story yesterday headlined “Seven million ‘use illegal files’”. Its bolded first paragraph stated:
Around seven million people in the UK are involved in illegal downloads, costing the economy tens of billions of pounds, government advisers say. [emphasis added]
7 million people involved in unauthorised file-sharing is possible, but costs of tens of billions of pounds? It’s not unusual to see such figures bandied around by the rightsholders derived from wild guesstimates of download figures and ludicrously unsound assumptions such as equating every download with a lost sale.
Here, however, it is according to “government advisers” — surely a much more reliable source! A quick read and we discover this isn’t the case at all and these figures are directly recycled from rightsholder sources — with an additional uplift from the BBC: a possible £10 billion or more a year has becomes tens (notice that extra “s”) of billions a year.
First off, the story is based on a report entitled “Copycats? Digital Consumers in an Online Age” commissioned by the Strategic Advisory Board in Intellectual Property (SABIP) from UCL’s Centre for Information Behaviour and the Evaluation of Research. So this is CIBER’s report not SABIP’s — SABIP need not even have endorsed the report. That said, one can see how the BBC’s confusion came about, and this is a minor point (after all CIBER is part of a university).
More important is a check of the actual evidence underlying these very large claimed costs to the economy. Let’s take a look at the report. Page 6, at the start of the Exec Summary states (this is where I guess the BBC got its material from):
Industry reports [3] suggest that at least seven million British citizens have downloaded unauthorised content, many on a regular basis, and many also without ethical consideration. Estimates as to the overall lost revenues [4] if we include all creative industries whose products can be copied digitally, or counterfeited, reach £10 billion (IP Rights, 2004), conservatively, as our figure is from 2004, and a loss of 4,000 jobs. This is in the context of the “Creative Industries” providing around 8% of British GDP. And the situation is not solely a British problem, but a global one. …
But wait a moment: their only source here seems to be (IP Rights, 2004) and that turns out to be a single page press release from an IP (law) firm which simply states:
“Rights owners have estimated that last year alone counterfeiting and piracy cost the UK economy £10 billion and 4,000 jobs.”
So these are just the standard (and utterly unreliable) rightsholders-claimed figures (and not even first-hand!). To be fair in footnote 4 the authors acknowledge that the phrase “lost revenues” is complex and that not all downloaded content would have been purchased. However, they then seem to backtrack on this by saying (rightsholders provided figures again!):
Nevertheless, industries such as music and film do frequently publish estimated lost revenues, or “value gaps’. The BPI recently claimed that between 2008 and 2012 the music industry was looking at a ‘value gap’ of £1.2 billion. (Music Ally, 2008)
Furthermore, that claim that things are “complex” worries me, as things are, in fact, pretty simple: lost revenues mean lost revenues, i.e. the revenues the industry would have got if no unauthorised downloading had occurred. This will clearly be much, much lower than a figure based on assuming every unauthorised download is a lost sale.
Furthermore, looking at revenues in a single industry is dangerous here: we’ve got to look at the overall impact on the economy (and that’s still ignoring the welfare/income distinction). For example, if someone makes an unauthorised download rather than buying a CD they spend the money they would have spent on the CD on something else, be that a haircut, a meal, or going to a concert. If we want to count that as a loss to the music industry we need to count the gain it generates elsewhere.
Good evidence doesn’t get any thicker on the ground later on either as far as I can tell. For example, in the first key finding section (entitled “The scale of the ‘problem’ is huge and growing”):
- The only empirical study they cite on the impact of filesharing is that Zentner with no mention of some other major studies such as that of Oberholzer and Strumpf.
- The only figure on the film industry they quote is a claim of a $6 billion annual loss put forward by the UK film industry in interview and “some research (Henning-Thurau et al., 2007) [which] appears to demonstrate evidence that consumers’ intention to pirate movies “cause them to forego theatre visits and legal DVD rentals and/or purchases.”. Looking up that citation one finds (seems there was a typo in the date!): Henning-Thurau, T, Gwinner, K, Walsh, G, Gremler, D (2004) Electronic Word of Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing. 18 (1) pp.38-52. While I haven’t actually read this article, the title (and journal) don’t suggest this as the most reliable source as to the actual effect of unauthorised downloads on film industry income.
To sum up: it turns out the BBC’s line that illegal downloads are “costing the economy tens of billions of pounds” is based on nothing more than the usual, and completely unreliable, rightsholders claims, recycled via CIBER’s report. This is a worrying example of how industry PR, via repetition in other, more “respected” and supposedly independent sources, can gain legitimacy.
Visualizing Technology Flows Over Time (I)
May 22nd, 2009
In my original post on Visualizing Technology Flows from Patent Data I just presented static information — flows for a single year. As I said there:
The next step is to watch how these flows, and the relationships implied by them, have evolved over time. We can do this by plotting the same graph say, every 3 years, from 1975 up until the present.
At the time I had already coded up, and computed, snapshots for each year. However, considerations of space, as well as a desire to find a way to display the information in a ‘nice’ (animated) form, warranted a separate entry. After what, as usual, has turned out to be a rather longer delay than intended, I’ve finally got round to having a first stab at this using simple animated gifs:
Animated Citation Flows 1975-1994 (1994 base year) (click through for full-size ~ 2MB). Click here to rerun the animation.
Here I’ve fixed the layout of the nodes based on the final year (1994) flows. I’ve also done quite a lot of tedious playing around (if only one had stylesheets!) with edge and node sizes to try and improve the look and they are still far from perfect (NB: this means edge/node sizes differ slightly from the images in the original post). As before:
- Size of nodes indicates total citation flows from that area in that year
- Yellow portion is citations back into that subcategory while black represents portion that is into other subcategories (comparison by area).
- Direction of flow is indicated by an arrow head (a rectangular block) with size of flow measured by width of edge and size of head.
Note that we are displaying year values not cumulative values — so, for example, links between nodes may get smaller or even disappear from one year to the next. What jumps out from this?
- The substantial increase in flows over time (most obviously seen in the size of the nodes).
- (At least based on examination by eye) no great change in the balance of these flows between cites outside and cites within a category (relative sizes of black and yellow in nodes).
- Growth has varied substantially across areas (largely, I would hazard, in line with the no. of patents in that area). In particular, the “Computer/Electronics” cluster (top-right) has grown substantially faster than the “Chemicals” sector at centre-left. Individual categories showing especially marked growth include: Biotechnology, Computer Hardware and Software, Communications, Information Storage, and Drugs.
- It also looks like some areas have grown more strongly linked and “clustered” over time (e.g. Computer/Electronics, and Drugs to Organic Compounds) though it is hard to tell from this visualization (pointing to the need for more formal techniques …).
- Something which is very clear from the visualization is that there is significant year-to-year variation with clear drops in flows in some cases year-on-year
I also computed another version where the network layout is based on that year’s flows — rather than with a fixed layout based on a given base year.
Unfortunately, this looks too “busy”, particularly as the sensitivity of the network layout algorithm (networkx.graphviz_layout) means that categories move around a lot. (To save on space — the files are big — I haven’t posted this up but if anyone is interested let me know and I’ll upload it).
One solution to this would be to move to rendering cumulative, rather than per-year, flows. This might also improve the base-year case: even there, it might be more natural, at least from a visual point of view, to display changes in flows over time via their impacts on “stocks” rather than displaying the “flows” themselves.
So, next steps:
- Plot cumulative flows
- Write up a more formal analysis based on e.g. PCA. I’ve already done PCAs on individual years and an animation might be interesting.
- Do animations right: the proper way to do this with would be with a proper “slider” widget and stop/start control. It looks like this should be pretty easy in javascript using e.g. jquery but it doesn’t look to be trivial — if it is please let me know how! (BTW: I know I could use Flash but it’s proprietary …).
Discounting and Self-Control
May 19th, 2009
I’m posting up an essay on “Discounting and Self-Control” (pdf). The essay, which I haven’t really touched for over a year, is still in its early stages but having lacked the time to do much on it over the last year, and going on the motto of “release early, release often”, I’m posting it up as a form of alpha version.
… then must you speak
Of one that loved not wisely, but too well;
Of one not easily jealous, but, being wrought,
Perplex’d in the extreme; of one whose hand,
Like the base Judean, threw a pearl away
Richer than all his tribe; …
Abstract
An agent’s intertemporal choices depend on a variety of factors, most prominently, their valuation of future payoffs as encapsulated in a discount function. However, it is also clear that factors such as self-control may also play an important role, and given the similarity of impact, a confouding one. We explore the literature on this issue as well as examining what occurs when those with higher time-preference (whether arising from discounting or self-control) also enjoy their consumption more.
Introduction
The exercise of will, especially in the form of self-control, has long been recognized as central to human existence, experience, and morality. Over the last few decades there has been increasing interest in the issue from a scientific perspective. At the same time, it has also long been appreciated that humans (and other animals) make trade-offs between the present and the future — as well as between different points in the future, and that events taking place closer to the present are given greater weight than those which are more distant. Traditionally, at least in economics, this type of behaviour has been subsumed under the heading of discounting.
Both of these factors, self-control and discounting, affect behaviour, and choices, in relation to outcomes which do not (all) take place in the present. However they are distinct. Specifically, consider a very simple case of two outcomes A and B where B occurs after A (for example, A might be one ice cream today and B an ice cream and a doughnut tomorrow). Self-control issues arise where one prefers B over A but is unable to execute on this preference and therefore actually takes (’chooses’) A. By contrast, in the discounting case A is actually preferred over B and therefore is chosen (freely) by the decision maker.
It would seem important to keep these two aspects of decision making clearly separated. While lack of ’self-control’ is usually seen as disadvantageous and a reason for adopting various ‘commitment strategies’ — for example, by opting to remove various items from the choice set (having no cigarettes in the house) — the simple preference for the present over the future incorporated in the discounting model would seem to generate no such difficulties.
However, empirically it may prove rather difficult to do so. As shown by the simple example above the same observed ‘choice’ for A (one ice cream today) over B (ice cream plus doughnut tomorrow) can be the result of two very different processes. Thus if we only observe choices, and not the underlying preferences and/or the process by which the choice is arrived at, it may be impossible to distinguish the two.
It is perhaps for this reason that these distinct aspects are sometimes conflated. Consider, for example, Mischel et al 1989 which is entitled “Delay of Gratification in Children” and summarizes much of Mischel of pioneering work on this area. Mischel’s approach is clearly more oriented along the self-control aspect, and this is borne out in the types of experiments conducted (more on this below). Nevertheless they state (p.934) “The obtained concurrent associations [between treatments and delay] are extensive, indicating that such preferences reflect a meaningful dimension of individual differences, and point to some of the many determinants and correlates of decisions to delay (18).” Here the orientation towards self-control has become a general “decision to delay” and this is borne out by the associated footnote (18) which references related literature in other disciplines and is worth quoting in its entirety:
Empirical Assessment of Impact of DRM on Exceptions and Limitations by Patricia Akester
May 7th, 2009
Patricia Akester, a colleague of mine in the Centre for Intellectual Property and Information Law has just published the results of her recent research in the form of a 208 page report entitled Technological accommodation of conflicts between freedom of expression and DRM: the first empirical assessment.
There has been a lot of debate as to whether DRM/TPM can be used to go ‘beyond copyright’ and restrict legitimate uses of copyrighted material but little empirical work. Patricia’s work is therefore very valuable in providing the first systematic empirical data that we can use to assess what is going on. Here I’ll let her conclusions speak for herself but I strongly encourage readers to take a look at the study itself via the above link:
[From p. 99-100] This project looked at the impact of DRM on the ability of users to take advantage of certain exceptions to copyright. Based on a series of interviews with key organisations and individuals, involved in the use of copyright material and the development and deployment of DRM, this study examined how these issues are working out in practice. While the nightmarish vision of digital lock up has not materialised, this survey concluded, nevertheless , that significant problems do exist, and others can readily be foreseen:
- Although DRM has not impacted on many acts permitted by law, certain permitted acts are being adversely affected by the use of DRM;
- This is in spite of the existence of technological solutions (enabling partitioning and authentication of users. to accommodate those permitted acts (privileged exceptions.;
- Beneficiaries of privileged exceptions who have been prevented from carrying out those permitted acts (because of the employment of DRM. have not used the complaints mechanism set out in UK law;
- Article 6(4. of the Information Society Directive put an onus on content owners to accommodate privileged exceptions voluntarily. Voluntary measures have emerged in the publishing field, but not all content owners are ready to act unless they are told to do so by regulatory authorities.
These four conclusions will be explained in more detail and this will be followed by proposed solutions and recommendations.
Talk at RES Annual Conference on “Is Google the Next Microsoft? Competition, Welfare and Regulation in Internet Search”
April 27th, 2009
Last Tuesday I was at the RES Annual Conference to present my paper “Is Google the Next Microsoft? Competition, Welfare and Regulation in Internet Search”. I’ve uploaded my slides from the talk here and below is a recently prepared overview. The full paper can be online on the SSRN site at:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1265521
Overview
Beginning from nothing twelve years ago, today online search is a multi-billion dollar business and search engine providers such as Google and Yahoo! have become household names.
While search has become increasingly ubiquitous it has also grown increasingly dominated by a single firm: Google. For example today in the UK Google accounts for 90% of all searches and in many other countries Google has a similar lead over its rivals.
In this paper I investigate why the search engine market is so concentrated and what implications this has for us both now, and in the future. I also look at whether search engines will require regulation and if so in what form. In doing so we also give a detailed explanation of the how the search engine market works, its history, and how it has come to be such a lucrative, and important, activity.
To summarize the main points:
(a) Though search engines provide ordinary users with a `free’ service they gain something very valuable in exchange: attention. Attention is an increasingly valuable good, being in ever more limited supply — after all each of us have a maximum of 24 hours of attention available in any one day (and usually much, much less). Access to that attention is correspondingly valuable especially for those who have products or services to advertise. Thus, while web search engines do not charge users, they can retail the attention generated by their service to those are willing to pay for access to it.
(b) The search engine market is already extremely concentrated. In many countries a single firm (usually Google) possesses of market share an order of magnitude larger than its rivals. As stated, in the UK Google already holds over 90% market share as. However, it is also noteworthy that there are some marked variations, for example in China Google trails the leaders.
(c) Competition issues are likely to become more serious as this dominance becomes established. It is important to realise that while search appears ‘free’ we do pay indirectly via the charges to advertises — who must in turn recoup that money from consumers. A dominant search engine may have incentives to distort its ‘results’ in ways that increase it owns profits but harm society — for example by suppressing organic search results that would substitute for or harm associated ’sponsored’ results (adverts).
(d) There are a number of approaches that regulators and policy-makers could take to protect against these adverse consequences. For example, policy-makers could look at ways to separate the ’software’ and ’service’ parts of a search engines activity, or less dramatically, they could set up a regulatory body to review search result rankings and choices.
Conclusion: it will be increasingly necessary for there to be some form of oversight, possibly extending to formal regulation, of the search engine market. In several markets monopoly, or near monopoly, already exists and there is every reason to think this situation will persist. Left unchecked by competition the private interests of a search engine and the interests of society as whole will diverge and, thus, left entirely unregulated, online search will develop in ways that are harmful to the general welfare.
It is therefore important that policy-makers begin now to develop their strategy in relation to this key area of the knowledge economy. The power rapidly accumulating in the hands of a few major search providers is a great one. It behoves to ensure that it is used in a way that brings the greatest benefit to society as a whole.
European Parliament Votes on Copyright Term Extension Tomorrow
April 22nd, 2009
Tomorrow, the European Parliament will vote on the issue of copyright term extension for sound recordings, known in Parliamentese as “the Crowley Report (A6-0070/2009) on the Term of protection of copyright and related rights” (Mr Brian Crowley is the rapporteur for this report and a strong supporter of the extension).
Extending term would be a tragic mistake and a blatant example of special-interest lobbying winning out of the interests of society as a whole.
Let us therefore hope that the proposal is rejected.
That’s the line being by some right-thinking MEPs including Eva Lichtenberger, Greens, Sharon Bowles, ALDE, Andrew Duff, ALDE, Zuzana Roithova, EPP, Christofer Fjellner, EPP, Guy Bono, PSE who have put forward a rejection amendment (see their excellent justification below). But they need all the support they can get and remember: it is never too late to act.
Rejection Amendment Justification
The draft Directive is poorly conceived and disproportionate. The Commission claims that the measure is needed in order to benefit poor performers. However, the proposed regulation and procedure is complicated and over-bureaucratic. The biggest beneficiaries will be the four largest record companies. Individual performers will only receive very small amounts each.
Performers could be helped much more effectively by regulating copyright contracts and collecting societies, by setting up appropriate social security and insurance schemes, and by reconsidering remuneration rights and license tariffs.
The draft Directive leaves a large number of questions unanswered. Additional impact assessments are needed to see which measures are best suited to help those performers really in need, to limit the negative impact on consumers and jobs, and to establish if regulation is best done at state or EU level. In these circumstances, it is not wise to proceed to make the long-term permanent changes proposed.
Some of the particular problems are:
The extension of copyright to 95 or even 70 years will increase the revenue of trust funds of deceased performers instead of living performers.
Many performers cannot produce proof for the performances they participated in during the past decades. It then becomes difficult to assess their rights to payments.
The proposed regulation could cause legal uncertainty for all existing audiovisual productions as it will be unclear if the material used is subject to sound copyright.
There is a risk that all material that is not commercially viable will not be marketed by the copyright owners and will become inaccessible for public use.
Small record companies currently publishing copyright-free material risk going bankrupt.
On March 18th I was in Brussels to give a talk as one of two “invited experts” (the other being from the Motion Picture Association) to a session on the topic of “Copyright Enforcement” held by the Working Group on Authors’ Rights of the European Parliament’s JURI Committee. Below is the slightly tidied up text of the talk I gave.
Talk Text
Good afternoon and thank-you for inviting me here today. To introduce myself I’m the Mead Fellow in Economics at Emmanuel College, University of Cambridge and an Associate at the Centre for Intellectual Property and Information Law also at the University of Cambridge. I believe that my colleague Professor Bently came here in October to speak to a similar gathering that time on the topic of copyright term extension.
To begin with I want to make a few general points before proceeding to the specific area — enforcement — that today’s meeting looks at.
The first point I would like to make is when we talk of copyright we must remember that it is not a single unified thing but, in reality, a bundle of different attributes. For example, there is the crucial distinction between:
- Economic rights: the ‘monopoly’ right to control reproduction and distribution of the work (and thereby to control, at least partially, its price). We should also note that in some cases this ‘exclusive’ right may be converted into a right for equitable remuneration.
- Moral rights: rights of attribution and integrity. These an exist separately and independently of any economic rights. Furthermore they are often norms that we respect irrespective of any copyright: I still credit Shakespeare for Romeo and Juliet even if it is in the ‘public domain’.
Furthermore these economic and moral rights have attributes such as:
- Term, i.e. the length that the right lasts.
- The breadth of the right. For example, in the US copyright for performers is ‘narrower’ than in the EU because certain uses of recording (notably broadcast on the radio) need not be paid for. There are also limitations and exceptions related to educational use or use for criticism where permission need not be sought from the rightsholder.
- Lastly there is enforcement. After all one can have very ’strong’ rights but then be permissive in enforcement, or, conversely, have more limited ‘rights’ but be very strict in the enforcement. I would also point out that enforcement is a social as well as legal matter: when I attribute an author the main reason I do it is not because I might get ’sued’ if I did not but because it is the right thing to do — people should be credited when their work is used wherever it is reasonable to do so.
The value of a right is determined by the interplay of all of these. Deciding on the level of enforcement is therefore the same problem as deciding on the level of copyright generally. And we can’t think about this without asking about the purpose behind copyright’s existence.
The answer here is a simple one: copyright is instrument created in order to promote the interests of society as a whole not to promote the interests of the producers of creative works. Of course we care about remunerating producers and artists both because they are members of society but also, and more importantly, because by remunerating them we ensure the creation of more works which society as a whole can enjoy.
Nevertheless, it is essential to keep in mind that the purpose of copyright is broader than to promote the interests of a single group. This fact then is central to any assessment of the form and level of copyright and it has important implications. For example if we have a proposal that will help artists but overall harm society we should not support that proposal. Moreover, it is also a fact that is sometimes neglected, for example this very working group is entitled “Working Group on Author’s Rights” not “Working Group on Copyright and Social Welfare”.
In using copyright to promote social welfare we are then presented with a basic trade off between the benefits of the monopoly in the form of the new work created as a result of the monopoly accrued rents, and its in the form of reduced access to creative works. We are therefore seeking a balance: we want enough copyright but not too much. And, returning to our point above, this logic applies to enforcement as much as any other aspect of the “copyright package”.
In particular: if there is already ‘too much’ copyright stronger enforcement will make things worse. If there is too little copyright then more enforcement will make things better. Now, I should make clear that my personal preference is for strong enforcement of fair rules.
Unfortunately, the rules currently aren’t fair — for example copyright is almost certainly far too long. As such it is hard to justify a push for strong enforcement. In addition, I would also argue that the unfairness of the current copyright regime is also a major reason why strong enforcement will be difficult, if not impossible, to achieve in practice. Why?
The reason is simple: the successful enforcement of any rule depends on that rule having public legitimacy — being considered reasonable by the majority of the populace. Currently that is not the case: copyright suffers from a serious lack of “respect” and a marked lack of public legitimacy.
If you wish to change that we need the rules to be fair and balanced — it hard to have respect and enforcement of an unfair system. For example, copyright term should be reduced and we should expressly avoid extensions, especially retrospective ones like that currently before Parliament in relation to sound recordings. Such policies appear to reflect nothing more than special interest lobbying and this can only make copyright’s “marked lack of public legitimacy” worse — I would note here the recent joint statement put out by European IP law centres who emphasized that retrospective term extension would seriously undermine respect for copyright and make “piracy the easy option”.
It will be almost impossible to enforce unjust rules. If we are to have strong enforcement it therefore must be of just rules. I would also argue that just rules must also be reasonable rules. For example, is it reasonable in an age of costless reproduction to continue to promote a model of copyright based on exclusive rights? Much of the “problem” of unauthorised file-sharing could be resolved if we moved to an alternative compensation system based on an equitable remuneration right approach. In one fell swoop we would eliminate the biggest “enforcement” problem going while also increasing the size of benefits to be divided between users and makers of creative works. Surely this is the more reasonable, and sensible, option!
As I am coming to the end of my allotted span let me conclude. Copyright must be designed to promote the welfare of society as a whole not one specific group. As such, in designing any aspect of copyright, including enforcement, it is important not to have too much as well as not to have too little. We must also remember that copyright, like any other rule or law, depends for its enforcement on willing compliance more than explicit punishment. As such the most important factor in ensuring better observance of copyright is to increase its legitimacy which it markedly lacks at present. To achieve that we need to create a more just, and more reasonable, copyright regime. Thank-you.
