The Size of the Public Domain
June 12th, 2009
This post continues the work begun in this earlier post on “Estimating Information Production and the Size of the Public Domain”.
Having already obtained estimates of the number of items (publications) produced each year based on library catalogue data our next step is to convert this into an estimate of the “size” of the public domain. (NB: as already discussed, “size” could mean several different things. Here, at least to start with, we’re going to take the simplest and crudest approach and equate size with number of publications/items.)
The natural, and most obvious, approach here is to go through our 1 million+ items and compute their public domain status (as discussed in this earlier post). Unfortunately, as detailed there, this is problematic because we often have insufficient information in library catalogues with which to compute PD status with certainty — in particular, author death dates are frequently absent. Thus, it will be necessary to fall back on some approximate method.
For example, we can use base PD status on simple publication dates: if a book was published, say, 140 years ago it is very likely it is in the public domain — for it to be in copyright its author must have lived more than 70 years after the book came out (remember copyright lasts for life plus 70 years in the EU)! Conversely, any publication less than 70 years old is almost certainly not in the public domain. For periods in between we can assume some proportion of publications are PD starting close to zero for more recent items and rising towards one for older ones. A calculation along those lines is provided in the following table:
| Start | End | Items | % PD | Number PD |
|---|---|---|---|---|
| 1400 | 1870 | 389291 | 100 | 389291 |
| 1870 | 1880 | 50564 | 95 | 48035 |
| 1880 | 1890 | 66857 | 90 | 60171 |
| 1890 | 1900 | 66883 | 80 | 53506 |
| 1900 | 1910 | 70360 | 50 | 35180 |
| 1910 | 1920 | 60489 | 30 | 18146 |
| 1920 | 1930 | 78670 | 10 | 7867 |
| 1930 | 1940 | 90576 | 5 | 4528 |
| Total | 873690 | 0.71 | 616724 |
Number of UK Public Domain Publications (Based on Cambridge University Library Catalogue Data)
So, based on the assumptions regarding PD proportions given in the table, there are somewhat over 600 thousand PD books according to the holdings of Cambridge University Library (of which just over half, approx 390k are from before 1870). The British Library dataset is approx 4x as big as Cambridge University Library and the numbers scale up roughly proportionately giving a total of over 2.4 million items.
Of course this is a fairly crude approach based purely on publication date and it be improved in a variety of ways, most notably by using the authorial birth date information which is usually present in catalogue data (we can also use death date information where present). This will be the subject of the next post.
Here we’re going to look at using library catalogue data as a source for estimating information production (over time) and the size of the public domain.
Library Catalogues
Cultural institutions, primarily libraries, have long compiled records of the material they hold in the form of catalogues. Furthermore, most countries have had one or more libraries (usually the national library) whose task included an archival component and, hence, whose collections should be relatively comprehensive, at least as regards published material.
The catalogues of those libraries then provide an invaluable resource for charting, in the form of publications, levels of information production over time (subject, of course, to the obvious caveats about coverage and the relationship of general “information production” to publications).
Furthermore, library catalogue entries record (almost) the right sort of information for computing public domain status, in particular a given record usually has a) a publication date b) unambiguously identified author(s) with birth date(s) (though unfortunately not death date). Thus, we can also use this catalogue data to estimate the size of the public domain — size being equated here to the total number of items currently in the public domain.
Results
To illustrate, here are some results based on the catalogue of Cambridge University Library which is one of the UK’s “copyright libraries” (i.e. they have a right to obtain, though not an obligation to hold, one copy of every book published in the UK). This first plot shows the numbers of publications per year (as determined by their publication date) up until 1960 (when the dataset ends) based on the publication date recorded in the catalogue.
A major concern when basing an analysis on these kinds of trends is is that fluctuations over time derive not from changes in underlying production and publication rates but changes in acquisition policies of the library concerned. To check for this, we present a second plot which shows the same information but derived from the British Library’s catalogue. Reassuringly, though there are differences, the basic patterns look remarkably similar.

Number of items (books etc) Per Year in the Cambridge University Library Catalogue (1600-1960).

Number of items (books etc) Per Year in the British Library Catalogue (1600-1960).
What do we learn from these graphs?
- In total there were over a million “Items” in this dataset (and parsing, cleaning, loading and analyzing this data took on the order of days — while the preparation work to develop and perfect these algorithms took weeks if not months)
- The main trend is a fairly consistent, and approximately exponential, increase in the number of publications (items) per year. At the start of our time period in 1600 we have around 400 items a year in the catalogue while by 1960 the number is over 16000.
- This is a forty-fold increase and corresponds to an annual growth rate of approx 0.8%. Assuming “growth” began only around the time of the industrial revolution (~ 1750) when output was around 1000 (10-year moving average) gives a fairly similar growth rate of around 0.89%.
- There are some fairly noticeable fluctuations around this basic trend:
- There appears to be a burst in publications in the decade or decade and a half before 1800. One can conjecture several, more or less intriguing, reasons for this: the cultural impact of the French revolution (esp. on radicalism), the effect of loosening copyright laws after Donaldson v. Beckett, etc. However, without substantial additional work, for example to examine the content of the publications in that period these must remain little more than conjectures.
- The two world wars appear dramatically in our dataset as sharp dips: the pre-1914 level of around 7k+ falls by over a third during the war to around 4.5k and then rises rapidly again to reach, and pass, 7k per year in the early 20s. Similarly, the late 1930s level of around 9.5k per year drops sharply upon the outbreak of war reaching a low of 5350 in 1942 (a drop of 45%), and then rebounding rapidly at the war’s end: from 5.9k in 1945 to 8k in 1946, 9k in 1947 and 11k in 1948!
To do next (but in separate entries — this post is already rather long!):
- Estimates for the the size of the public domain: how many of those catalogue items are in the public domain
- Distinguishing Publications (”Items”) from “Works” — i.e. production of new material versus the reissuance of old (see previous post for more on this).
Colophon: Background to this Research
I’m working on a EU funded project on the Public Domain in Europe, with particular focus on the size and value of the public domain. This involves getting large datasets about cultural material and trying to answer questions like: How many of these items are in the public domain? What’s the difference in price and availability of public domain versus non public domain items?
I’ve also been involved for several years in Public Domain Works, a project to create a database of works which were in the public domain.
Colophon: Data and Code
All the code used in parsing, loading and analysis is open and available from the Public Domain Works mercurial repository. Unfortunately, the library catalogue data is not: library catalogue data, at least in the UK, appears to be largely proprietary and the raw data kindly made available to us for the purposes of this research by the British Library and Cambridge University Library was provided only on a strictly confidential basis.
Empirical Assessment of Impact of DRM on Exceptions and Limitations by Patricia Akester
May 7th, 2009
Patricia Akester, a colleague of mine in the Centre for Intellectual Property and Information Law has just published the results of her recent research in the form of a 208 page report entitled Technological accommodation of conflicts between freedom of expression and DRM: the first empirical assessment.
There has been a lot of debate as to whether DRM/TPM can be used to go ‘beyond copyright’ and restrict legitimate uses of copyrighted material but little empirical work. Patricia’s work is therefore very valuable in providing the first systematic empirical data that we can use to assess what is going on. Here I’ll let her conclusions speak for herself but I strongly encourage readers to take a look at the study itself via the above link:
[From p. 99-100] This project looked at the impact of DRM on the ability of users to take advantage of certain exceptions to copyright. Based on a series of interviews with key organisations and individuals, involved in the use of copyright material and the development and deployment of DRM, this study examined how these issues are working out in practice. While the nightmarish vision of digital lock up has not materialised, this survey concluded, nevertheless , that significant problems do exist, and others can readily be foreseen:
- Although DRM has not impacted on many acts permitted by law, certain permitted acts are being adversely affected by the use of DRM;
- This is in spite of the existence of technological solutions (enabling partitioning and authentication of users. to accommodate those permitted acts (privileged exceptions.;
- Beneficiaries of privileged exceptions who have been prevented from carrying out those permitted acts (because of the employment of DRM. have not used the complaints mechanism set out in UK law;
- Article 6(4. of the Information Society Directive put an onus on content owners to accommodate privileged exceptions voluntarily. Voluntary measures have emerged in the publishing field, but not all content owners are ready to act unless they are told to do so by regulatory authorities.
These four conclusions will be explained in more detail and this will be followed by proposed solutions and recommendations.
European Parliament Votes on Term Extension: The Result
April 24th, 2009
Yesterday, the European Parliament voted on the term extension proposal.
Unfortunately though opposition was substantial it was not enough to prevent the modified (70-year) extension passing:
- Amendment in favour of the rejection: 222 IN FAVOUR, 370 AGAINST, 10 ABSTENTION
- Key amendment to ensure benefits only to performers: rejected (no roll-call vote so numbers unknown)
- All other good amendments (no ex-post, lifetime of performer only): rejected (~150 in favour 400 against)
Final vote: 317 in favour 178 against 37 abstention
Though this is a depressing result this is not yet the end of the matter by any means: the Council has not yet resolved its position and there is a possibility of a second reading.
The level of opposition was also impressive given that there was strong support for the extension not only from the rapporteur (Mr Crowley), but also from the main political groupings (EPP and PSE) led by their shadow rapporteurs Mr Toubon and Ms Gill respectively (on a fairly obscure issue such as this most MEPs will have little time to scrutinize the matter and will usually follow the “party line” as determined by the party rapporteur and coordinator for that dossier).
European Parliament Votes on Copyright Term Extension Tomorrow
April 22nd, 2009
Tomorrow, the European Parliament will vote on the issue of copyright term extension for sound recordings, known in Parliamentese as “the Crowley Report (A6-0070/2009) on the Term of protection of copyright and related rights” (Mr Brian Crowley is the rapporteur for this report and a strong supporter of the extension).
Extending term would be a tragic mistake and a blatant example of special-interest lobbying winning out of the interests of society as a whole.
Let us therefore hope that the proposal is rejected.
That’s the line being by some right-thinking MEPs including Eva Lichtenberger, Greens, Sharon Bowles, ALDE, Andrew Duff, ALDE, Zuzana Roithova, EPP, Christofer Fjellner, EPP, Guy Bono, PSE who have put forward a rejection amendment (see their excellent justification below). But they need all the support they can get and remember: it is never too late to act.
Rejection Amendment Justification
The draft Directive is poorly conceived and disproportionate. The Commission claims that the measure is needed in order to benefit poor performers. However, the proposed regulation and procedure is complicated and over-bureaucratic. The biggest beneficiaries will be the four largest record companies. Individual performers will only receive very small amounts each.
Performers could be helped much more effectively by regulating copyright contracts and collecting societies, by setting up appropriate social security and insurance schemes, and by reconsidering remuneration rights and license tariffs.
The draft Directive leaves a large number of questions unanswered. Additional impact assessments are needed to see which measures are best suited to help those performers really in need, to limit the negative impact on consumers and jobs, and to establish if regulation is best done at state or EU level. In these circumstances, it is not wise to proceed to make the long-term permanent changes proposed.
Some of the particular problems are:
The extension of copyright to 95 or even 70 years will increase the revenue of trust funds of deceased performers instead of living performers.
Many performers cannot produce proof for the performances they participated in during the past decades. It then becomes difficult to assess their rights to payments.
The proposed regulation could cause legal uncertainty for all existing audiovisual productions as it will be unclear if the material used is subject to sound copyright.
There is a risk that all material that is not commercially viable will not be marketed by the copyright owners and will become inaccessible for public use.
Small record companies currently publishing copyright-free material risk going bankrupt.
Public Domain in Europe (EUPD) Research Project
May 26th, 2008
I’m part of a team, led by Rightscom, which has won a bid to do a major analysis of the scope and nature of the public domain in Europe for the European Commission. As it says in the announcement:
We will assemble quantitative and qualitative data and produce a methodology for measuring the public domain which can be used and refined for future studies both within Europe and further a field. The objectives of the report are four fold:
- To estimate the number of works in the public domain in the EU and calculate approximately the levels and ways of use and main users of published works
- To estimate the current economic value of public domain works and estimate the value of works that in the next 10-20 years are to be released into the public domain and determine any change in its value whilst under copyright and once it is on the public domain
…
For my part, I’m going to be particularly focused on the size and value questions. This will involve getting large datasets about cultural material and trying to answer questions like: How many of these items are in the public domain? What’s the difference in price and availability of public domain versus non public domain items?
I’ve now posted my slides from the Musicans, Fans and Online Copyright event which took place last Wednesday at LSE. They can be found on this site:
http://rufuspollock.org/economics/papers/musicians_fans_and_online_copyright_20080319/
For anyone with an interest in copyright issues, particularly in the online environment, there is an excellent event on today at the LSE organized by Ian Brown of the OII and at which I’ll be speaking (briefly) on the subject of “How can we maximise copyright’s return to society?” More details below.
Musicians, fans and online copyright
Wednesday 19 March 2008 14:00 - 17:00
- John Kennedy, CEO of IFPI
- Paul Sanders, Director of Strategy at Playlouder
- Becky Hogge, Open Rights Group
- Adrian Brazier, DBERR
- Lilian Edwards, Southampton University
- Rufus Pollock, Cambridge University
- Michelle Childs, Knowledge Ecology International
- Wendy Grossman, musician / freelance journalist
Location: Old Theatre, London School of Economics, Houghton Street, London, WC2A 2AE, United Kingdom.
This Wednesday afternoon we have a great selection of speakers for our free OII/LSE event on music and copyright. Come along to find out what the government, music industry, publishers and independent experts are thinking about ideas like 3-strikes-and-you’re-disconnected; scanning ISP traffic for copyright works; and notice and takedown regimes.
Full programme at: http://www.oii.ox.ac.uk/events/details.cfm?id=186
Extracts from the Report of the 1876-1878 Royal Commission on Copyright
October 4th, 2006
1876-1878 Commision on Copyright. Metadata: http://www.bopcris.ac.uk/bop1833/ref2103.html
My copy came from Cambridge University Library.
Main Report
General Remarks:
- Contains summary in appendix of the law up to that point in a form of a digest
- NO discussion of principles at all and very little hard evidence
Their comments on existing law (vii, para 7-9):
The first observation which a study of the existing law suggests is that its form, as distinguished from its substance, seems to us bad. The law is wholly destitute of any sort of arrangement, incomplete, often obscure, and even when it is intelligible upon long study, it is in many parts so ill-expressed that no one who does not give it such study to it can expect to understand it.
The common law principles which lie at the root of the law have nevere been settleed. The well-known cases of Millar v. Taylor, Donaldson v. Becket, and Jeffries v. Boosey, ended in a difference of opinion amongst many of the most eminent judges who have ever sat upon the Bench.
The fourteen Acts of Parliament which deal with the subject were passed at different times between 1735 and 1875. They are drawn in different styles, and some are dwan so as to be hardly intelligible. Obscurity of style, however, is only one of the defects of these Acts. Their arrangement is often worse than their style. Of this the Copyright Act of 1842 is a conspicuous instance.
The need for copyright protection (viii-ix): Taking the law as it stands, we entertain no doubt that the interest of authors and of the public alike requires that some specific protection should be afforded by legislation to owners of copyright; and we have at the the conclusion that copyright should continue to be treated as a proprietary right, and that it is not expedient to substitute a right to a royalty defined by statute, or any other right of similar kind.
p. ix: dismiss royalty system - though not for great reason and without any evaluation of its costs and benefits - on the classic grounds that no-one else is doing it.
p.x-xii: should extend and harmonize terms to life + 30 years. Note that most other countries have longer terms (except US and Canada)
should introduce compulsory registration and modernize the registration system.
p. xxxvi ff.: ‘The American Question’
- Interesting discussion of the US situation.
- However there is a rather serious lack of HARD evidence
- advocate strongly a copyright treaty with the US and state that British authors are being damaged by not having protection
Separate (Dissenting) Report by Sir Louis Mallet:
- excellent, need i say more than the person has some acquaintance with economic reasoning and actually applies himself to the principles. It is amazing - it’s all there, all the arguments of a century later.
- every para is worth quoting but i shall confine myself
- [xlvi]:
I do not consider that a copyright law, or, in other words, a law which enables a copyright owner to prevent other persons from copying published works, rest upon the same rounds of public expediency as those which justify the recognition by law of proprietary rights generally. Nor does it appear that in modern times it has been ever so regarded by the legislation of the countries where it exists.
- [xlix para 15]:
From this point of view the question becomes a purely practical one, viz., whether any special interference by law is required to ensure for a community the best possible literature at the cheapest possible price.
- [l para 32]:
A monopoly should never be created with the view of remunerating a person or class, if that object can be effected without it; the profits of authorship are one thing, and the profits of publication another; and even if some form of monopoly is necessary to protect the first, it is equally desirable, in the interest of the author and that of the public, that the profits of publication, which are purely of a commercial character, should be regulated and controlled by the ordinary laws of trade.
- [li para 35]:
In the two great markets for English literature, the United Kingdom and the United States of America, the existing system has been described as one of
monopoly tempered by piracy
.
Macaulay on Copyright Extensions
December 6th, 2004
Here is Lord Macaulay (unsuccessfully) opposing an extension of copyright term from 28 to 60 years in the 1840s:
It is good that authors should be remunerated, and the least exceptionable way of remunerating them is by a monopoly. Yet monopoly is evil. For the sake of the good we must submit to the evil: but the evil ought not to last a day longer than is necessary for the purpose of securing the good
….
Dr Johnson died 56 years ago. If the law were what my honourable and learned friend wishes to make it, somebody would now have the monopoly of Dr Johnson’s works. Who that somebody would be, it is impossible to say: but we may venture to guess. I guess, then, that it would have been some bookseller, who was the assignee of antoher bookseller, who was the grandson son a third bookseller, who had bought the copyright from Black Frank, the Doctor’s servant and residuary legatee in 1785 and 1786. Now, would the knowledge that this copyright would exist in 1841 have been a source of gratification to Johnson? Would it have stimulated his exertions? Would it have once drawn him out of his bed before noon? Would it have cheered him in a fit of spleen? Would it have induced him to give us one more allegory, one more life of a poet, one more imitation of Juvenal? I firmly believe not.
