Visualizing Technology Flows Over Time (I)
May 22nd, 2009
In my original post on Visualizing Technology Flows from Patent Data I just presented static information — flows for a single year. As I said there:
The next step is to watch how these flows, and the relationships implied by them, have evolved over time. We can do this by plotting the same graph say, every 3 years, from 1975 up until the present.
At the time I had already coded up, and computed, snapshots for each year. However, considerations of space, as well as a desire to find a way to display the information in a ‘nice’ (animated) form, warranted a separate entry. After what, as usual, has turned out to be a rather longer delay than intended, I’ve finally got round to having a first stab at this using simple animated gifs:
Animated Citation Flows 1975-1994 (1994 base year) (click through for full-size ~ 2MB). Click here to rerun the animation.
Here I’ve fixed the layout of the nodes based on the final year (1994) flows. I’ve also done quite a lot of tedious playing around (if only one had stylesheets!) with edge and node sizes to try and improve the look and they are still far from perfect (NB: this means edge/node sizes differ slightly from the images in the original post). As before:
- Size of nodes indicates total citation flows from that area in that year
- Yellow portion is citations back into that subcategory while black represents portion that is into other subcategories (comparison by area).
- Direction of flow is indicated by an arrow head (a rectangular block) with size of flow measured by width of edge and size of head.
Note that we are displaying year values not cumulative values — so, for example, links between nodes may get smaller or even disappear from one year to the next. What jumps out from this?
- The substantial increase in flows over time (most obviously seen in the size of the nodes).
- (At least based on examination by eye) no great change in the balance of these flows between cites outside and cites within a category (relative sizes of black and yellow in nodes).
- Growth has varied substantially across areas (largely, I would hazard, in line with the no. of patents in that area). In particular, the “Computer/Electronics” cluster (top-right) has grown substantially faster than the “Chemicals” sector at centre-left. Individual categories showing especially marked growth include: Biotechnology, Computer Hardware and Software, Communications, Information Storage, and Drugs.
- It also looks like some areas have grown more strongly linked and “clustered” over time (e.g. Computer/Electronics, and Drugs to Organic Compounds) though it is hard to tell from this visualization (pointing to the need for more formal techniques …).
- Something which is very clear from the visualization is that there is significant year-to-year variation with clear drops in flows in some cases year-on-year
I also computed another version where the network layout is based on that year’s flows — rather than with a fixed layout based on a given base year.
Unfortunately, this looks too “busy”, particularly as the sensitivity of the network layout algorithm (networkx.graphviz_layout) means that categories move around a lot. (To save on space — the files are big — I haven’t posted this up but if anyone is interested let me know and I’ll upload it).
One solution to this would be to move to rendering cumulative, rather than per-year, flows. This might also improve the base-year case: even there, it might be more natural, at least from a visual point of view, to display changes in flows over time via their impacts on “stocks” rather than displaying the “flows” themselves.
So, next steps:
- Plot cumulative flows
- Write up a more formal analysis based on e.g. PCA. I’ve already done PCAs on individual years and an animation might be interesting.
- Do animations right: the proper way to do this with would be with a proper “slider” widget and stop/start control. It looks like this should be pretty easy in javascript using e.g. jquery but it doesn’t look to be trivial — if it is please let me know how! (BTW: I know I could use Flash but it’s proprietary …).
Empirical Assessment of Impact of DRM on Exceptions and Limitations by Patricia Akester
May 7th, 2009
Patricia Akester, a colleague of mine in the Centre for Intellectual Property and Information Law has just published the results of her recent research in the form of a 208 page report entitled Technological accommodation of conflicts between freedom of expression and DRM: the first empirical assessment.
There has been a lot of debate as to whether DRM/TPM can be used to go ‘beyond copyright’ and restrict legitimate uses of copyrighted material but little empirical work. Patricia’s work is therefore very valuable in providing the first systematic empirical data that we can use to assess what is going on. Here I’ll let her conclusions speak for herself but I strongly encourage readers to take a look at the study itself via the above link:
[From p. 99-100] This project looked at the impact of DRM on the ability of users to take advantage of certain exceptions to copyright. Based on a series of interviews with key organisations and individuals, involved in the use of copyright material and the development and deployment of DRM, this study examined how these issues are working out in practice. While the nightmarish vision of digital lock up has not materialised, this survey concluded, nevertheless , that significant problems do exist, and others can readily be foreseen:
- Although DRM has not impacted on many acts permitted by law, certain permitted acts are being adversely affected by the use of DRM;
- This is in spite of the existence of technological solutions (enabling partitioning and authentication of users. to accommodate those permitted acts (privileged exceptions.;
- Beneficiaries of privileged exceptions who have been prevented from carrying out those permitted acts (because of the employment of DRM. have not used the complaints mechanism set out in UK law;
- Article 6(4. of the Information Society Directive put an onus on content owners to accommodate privileged exceptions voluntarily. Voluntary measures have emerged in the publishing field, but not all content owners are ready to act unless they are told to do so by regulatory authorities.
These four conclusions will be explained in more detail and this will be followed by proposed solutions and recommendations.
Results of the Trading Funds Review
April 23rd, 2009
The Government announced last summer a further review of how trading funds supply PSI. The results of this review had been expected with the budget.
However, instead of the results of a review, trading funds were included in the report of the Operational Efficiency Programme in the section on “Asset management and sales” in the “final report”. Box 3A p.41 summarized the trading fund assessment exercise:
The first phase of the Trading Fund Assessment considered how a number of Government businesses could open up the information they create or hold as a result of carrying out their core public duties. The businesses were Met Office, Land Registry, Ordnance Survey, Companies House, Driver and Vehicle Licensing Agency and UK Hydrographic Office.
…
The Assessment identified key principles of good practice relating to information produced by all Trading Funds. These principles are:
- information easily available – where possible at low or marginal cost;
- clear and transparent pricing structures for the information, with different parts of the business accounted for separately;
- simple and transparent licences to facilitate the re-use of information for purposes other than that for which it was originally created; and
- clearly and independently defined – with input from customers and stakeholders – core purposes (“public tasks”) of the organisations.
The Office of Public Sector Information will provide enhanced oversight and governance to ensure application of these principles across the Trading Funds that create significant amounts of information.
A new business strategy for Ordnance Survey has been developed (see Box 3.H) which also will ensure easier and simpler access to high-quality information. Further work on the future business plans and models for specific Trading Funds – as well as consideration of the effectiveness of the Trading Fund model – will now be incorporated into the Operational Efficiency Programme.
So what we have is:
- A vague (”where possible”) commitment to “low or marginal cost” pricing but with “low” undefined — thereby leaving plenty of ‘wiggle room’. In any case the main PSI trading funds are explicitly excepted from this it seems — see below.
- Some centralization of oversight in OPSI (though not clear what power OPSI will have)
- Public tasks that are clearly and independently defined (though not clear who ensures independence)
- More pricing transparency within trading funds (though again little detail as to how this will be managed or enforced)
There were separate, specific, assessments for 3 of the trading funds mentioned in Box 3A: the Land Registry (box 3.E p.45), the Met Office (box 3.F p.46) and Ordnance Survey (box 3.H p.47). Each of these assessments consisted of just a few paragraphs (the assessments are excerpted in full below).
The Land Registry and Met Office assessments were, in essence, “pats on the back” with clear endorsements of their current operational model — albeit with an encouragement to expand commercial operations and be more efficient. Pricing policies weren’t mentioned.
For Ordnance Survey the tone was slightly different with a stated need for the OS to be “more customer-focused and commercially driven”. However, again there was no mention at all of pricing policies.
Where was the assessment of marginal cost pricing (or other pricing model) for “raw” bulk data — the recommended option from the Cambridge study (of which I was a co-author)? Where the detailed discussion of the regulatory model that needs to put in place to ensure that the system works well? Entirely absent! This is truly disappointing and one can only feel that the a serious opportunity has been missed here.
Trading Fund Assessments from the OEP Report
Box 3.E Land Registry
Land Registry maintains and develops a stable and effective land registration system throughout England and Wales, providing the cornerstone for the creation and free movement of interests in land. Giving a state-backed security for title to registered estates and interests in land for the whole of England and Wales, and ready access to up-to-date and guaranteed land information, enables confident dealings in property and security of title.
In addition, Land Registry produces property price reports and delivers a range of non- statutory added-value products and services. Land Registry is committed to providing high quality, cost-effective services which are delivered promptly to all customers. A review of the business model was undertaken as part of the OEP. This concluded that in light of current market conditions and recognising the need to retain responsibility for the creation, recording and guaranteeing of title to land within Government, the following improvements to the operating framework of the business have been identified and will be delivered;
- realising significant efficiency savings through a programme which includes estate and operational rationalisation and market testing of support functions that will result in a more streamlined, resourceful organisation;
- developing opportunities for the provision of wider commercial services and products;
- identifying synergies with the functions and data requirements of other public sector bodies with a view to achieving efficiency improvements through greater collaboration; and
- exploring opportunities to accelerate these initiatives through joint ventures and/or outsourcing of activities to third party providers.
Box 3.F Met Office
The Met Office is a world-leading provider of weather forecasts and climate change modelling and advice to the general public, specialist customers throughout the public sector and an increasing number of private sector customers.
It is essential that the Met Office’s unified approach to short, medium and long term forecasting and climate modelling, which is the most efficient and sophisticated in the world, is preserved. The Met Office also performs a number of key government roles, especially in international data collaboration and UK representation. In order to maintain the quality of its services it will require long-term investment and the freedom to develop its operations. There remains potential to expand commercial operations at the Met Office beyond those already provided, possibly through the introduction of private capital in some areas.
Over the coming months the project team will:
- work closely with the MOD as the owner department and HM Treasury to identify improvements to its business model, ownership structure and financial framework in order to reduce the administrative burden, maximise its development and to fully exploit the market opportunities open to it;
- work with other public sector bodies to achieve efficiency improvements through greater collaboration or transfer of functions;
- explore increased commercial activities, for example weather warnings to industry and helping business understand the impact of climate change;
- seek opportunities for private sector partners to develop specific services to complement the Met Office’s business; and
- maximise operational freedoms and reduce bureaucracy in the interface between the Met Office and the MOD.
Box 3.H: Ordnance Survey
Ordnance Survey collects, maintains and publishes high quality and up-to-date geographical information for the whole of Great Britain. Ordnance Survey provides data and services to customers both directly and indirectly through its network of commercial partners. The Government is committed to stimulating innovation in the geographical information market, increasing competition where it would be beneficial to consumers and to making geographical data and services more easily available.
The OEP has concluded so far that Ordnance Survey needs to be more customer-focused and commercially driven. The Government is therefore publishing a new commercial strategy for the Ordnance Survey on their website. The new strategy balances the requirement to maintain the highest quality standards with the need to significantly enhance ease of access to geographic data and services for both commercial and non-commercial use.
The new strategy seeks to equip Ordnance Survey to thrive in and better support competition and innovation in a wider geographical information market that is being transformed by advances in technology. It is a significant and ambitious programme of change. The Government has set key milestones for delivery in 6 and 12 months’ time and beyond, as well as a process for independent review and challenge of progress. If sufficient progress is not made to promote competition and innovation in these timescales, the Government will consider further reforms. Opportunities to accelerate the delivery of initiatives through introducing further commercial experience and capabilities will be fully explored over the coming year.
European Parliament Votes on Copyright Term Extension Tomorrow
April 22nd, 2009
Tomorrow, the European Parliament will vote on the issue of copyright term extension for sound recordings, known in Parliamentese as “the Crowley Report (A6-0070/2009) on the Term of protection of copyright and related rights” (Mr Brian Crowley is the rapporteur for this report and a strong supporter of the extension).
Extending term would be a tragic mistake and a blatant example of special-interest lobbying winning out of the interests of society as a whole.
Let us therefore hope that the proposal is rejected.
That’s the line being by some right-thinking MEPs including Eva Lichtenberger, Greens, Sharon Bowles, ALDE, Andrew Duff, ALDE, Zuzana Roithova, EPP, Christofer Fjellner, EPP, Guy Bono, PSE who have put forward a rejection amendment (see their excellent justification below). But they need all the support they can get and remember: it is never too late to act.
Rejection Amendment Justification
The draft Directive is poorly conceived and disproportionate. The Commission claims that the measure is needed in order to benefit poor performers. However, the proposed regulation and procedure is complicated and over-bureaucratic. The biggest beneficiaries will be the four largest record companies. Individual performers will only receive very small amounts each.
Performers could be helped much more effectively by regulating copyright contracts and collecting societies, by setting up appropriate social security and insurance schemes, and by reconsidering remuneration rights and license tariffs.
The draft Directive leaves a large number of questions unanswered. Additional impact assessments are needed to see which measures are best suited to help those performers really in need, to limit the negative impact on consumers and jobs, and to establish if regulation is best done at state or EU level. In these circumstances, it is not wise to proceed to make the long-term permanent changes proposed.
Some of the particular problems are:
The extension of copyright to 95 or even 70 years will increase the revenue of trust funds of deceased performers instead of living performers.
Many performers cannot produce proof for the performances they participated in during the past decades. It then becomes difficult to assess their rights to payments.
The proposed regulation could cause legal uncertainty for all existing audiovisual productions as it will be unclear if the material used is subject to sound copyright.
There is a risk that all material that is not commercially viable will not be marketed by the copyright owners and will become inaccessible for public use.
Small record companies currently publishing copyright-free material risk going bankrupt.
On March 18th I was in Brussels to give a talk as one of two “invited experts” (the other being from the Motion Picture Association) to a session on the topic of “Copyright Enforcement” held by the Working Group on Authors’ Rights of the European Parliament’s JURI Committee. Below is the slightly tidied up text of the talk I gave.
Talk Text
Good afternoon and thank-you for inviting me here today. To introduce myself I’m the Mead Fellow in Economics at Emmanuel College, University of Cambridge and an Associate at the Centre for Intellectual Property and Information Law also at the University of Cambridge. I believe that my colleague Professor Bently came here in October to speak to a similar gathering that time on the topic of copyright term extension.
To begin with I want to make a few general points before proceeding to the specific area — enforcement — that today’s meeting looks at.
The first point I would like to make is when we talk of copyright we must remember that it is not a single unified thing but, in reality, a bundle of different attributes. For example, there is the crucial distinction between:
- Economic rights: the ‘monopoly’ right to control reproduction and distribution of the work (and thereby to control, at least partially, its price). We should also note that in some cases this ‘exclusive’ right may be converted into a right for equitable remuneration.
- Moral rights: rights of attribution and integrity. These can exist separately and independently of any economic rights. Furthermore they are often norms that we respect irrespective of any copyright: I still credit Shakespeare for Romeo and Juliet even if it is in the ‘public domain’.
Furthermore these economic and moral rights have a variety of attributes such as:
- Term, i.e. the length that the right lasts.
- The breadth of the right. For example, in the US copyright for performers is ‘narrower’ than in the EU because certain uses of recording (notably broadcast on the radio) need not be paid for. There are also limitations and exceptions related to educational use or use for criticism where permission need not be sought from the rightsholder.
- Lastly there is enforcement. After all one can have very ’strong’ rights but then be permissive in enforcement, or, conversely, have more limited ‘rights’ but be very strict in the enforcement. I would also point out that enforcement is a social as well as legal matter: when I attribute an author the main reason I do it is not because I might get ’sued’ if I do not but because it is the right thing to do — people should be credited when their work is used wherever it is reasonable to do so.
The value of a right is determined by the interplay of all of these. Deciding on the level of enforcement is therefore the same problem as deciding on the level of copyright generally. And we can’t think about this without asking about the purpose behind copyright’s existence.
The answer here is a simple one: copyright is instrument created in order to promote the interests of society as a whole — not, I must emphasize, to promote the interests of the producers of creative works. Of course we care about remunerating producers and artists, both because they are members of society, but also, and more importantly, because by remunerating them we ensure the creation of more works which society as a whole can enjoy.
Nevertheless, it is essential to keep in mind that the purpose of copyright is broader than to promote the interests of a single group. This fact then is central to any assessment of the form and level of copyright and it has important implications. For example if we have a proposal that will help artists but overall harm society we should not support that proposal. Moreover, it is also a fact that is sometimes neglected, for example this very working group is entitled “Working Group on Author’s Rights” not “Working Group on Copyright and Social Welfare”.
In using copyright to promote social welfare we are then presented with a basic trade off between the benefits of the monopoly in the form of the new work created as a result of the monopoly accrued rents, and its in the form of reduced access to creative works. We are therefore seeking a balance: we want enough copyright but not too much. And, returning to our point above, this logic applies to enforcement as much as any other aspect of the “copyright package”.
In particular: if there is already ‘too much’ copyright stronger enforcement will make things worse. If there is too little copyright then more enforcement will make things better. Now, my personal preference is for strong enforcement of fair rules
Unfortunately, the rules currently aren’t fair — for example copyright is almost certainly far too long. As such it is hard to justify a push for strong enforcement. In addition, I would also argue that the unfairness of the current copyright regime is also a major reason why strong enforcement will be difficult, if not impossible, to achieve in practice. Why?
The reason is simple: the successful enforcement of any rule depends on that rule having public legitimacy — being considered reasonable by the majority of the populace. Currently that is not the case: copyright suffers from a serious lack of “respect” and a marked lack of public legitimacy.
If you wish to change that we need the rules to be fair and balanced — it hard to have respect and enforcement of an unfair system. For example, copyright term should be reduced and we should expressly avoid extensions, especially retrospective ones like that currently before Parliament in relation to sound recordings. Such policies appear to reflect nothing more than special interest lobbying and this can only make copyright’s “marked lack of public legitimacy” worse — I would note here the recent joint statement put out by European IP law centres who emphasized that retrospective term extension would seriously undermine respect for copyright and make “piracy the easy option”.
It will be almost impossible to enforce unjust rules. If we are to have strong enforcement it therefore must be of just rules and just rules must be reasonable rules. For example, is it reasonable in an age of costless reproduction to continue to promote a model of copyright based on exclusive rights? Much of the “problem” of unauthorised file-sharing could be resolved if we moved to an alternative compensation system based on an equitable remuneration right approach. In one fell swoop we would eliminate the biggest “enforcement” problem going while also increasing the size of benefits to be divided between users and makers of creative works. Surely this is the more reasonable, and sensible, option!
As I am coming to the end of my allotted span let me conclude. Copyright must be designed to promote the welfare of society as a whole not one specific group. As such, in designing any aspect of copyright, including enforcement, it is important not to have too much as well as not to have too little. We must also remember that copyright, like any other rule or law, depends for its enforcement on willing compliance more than explicit punishment. As such the most important factor in ensuring better observance of copyright is to increase its legitimacy which it markedly lacks at present. To achieve that we need to create a more just, and more reasonable, copyright regime. Thank-you.
Yesterday (Monday) The Times published an open letter signed by many of the leading UK academics concerned with the issue of copyright term extension.
The letter, of which I was a signatory, is focused on the change in the UK government’s position (from one of opposition to a term extension to, it appears, one of allowing an extension “perhaps to 70 years”). However, it is noteworthy that this is only one in a long line of well-nigh universal opposition among scholars to this proposal to extend copyright term.
For example, last April a joint letter was sent to the Commission signed by more than 30 of the most eminent European (and a few US) economists who have worked on intellectual property issues (including several Nobel prize winners, the Presidents of the EEA and RES, etc). The letter made very clear that term extension was considered to be a serious mistake (you can find a cached copy of this letter online here). More recently — only two weeks ago — the main European centres of IP law issued a statement (addendum) reiterating their concerns and calling for a rejection of the current proposal.
Despite this universal opposition from IP experts the Commission put forward a proposal last July to extend term from 50 to 95 years (retrospectively as well as prospectively). That proposal is now in the final stages of its consideration by the European Parliament and Council. We can only hope that they will understand the basic point that an extension of the form proposed must inevitably to more harm than good to the welfare of the EU and should therefore be opposed.
The Letter
Dear Minister,
Open Letter re. Proposed Copyright Term Extension for Sound Recordings
We are writing because of the sudden, and unexplained, change of Government position in relation to copyright term extension for sound recordings.
In 2006, the Government received the recommendations of an independent and comprehensive review of intellectual property policy, commissioned by the then Chancellor Gordon Brown. The review, led by Andrew Gowers (a former editor of the Financial Times) took “an evidence-based approach to its policy analysis”, supplementing a formal call for evidence with commissioned external expertise.
The review examined several extension options, including the increase to 70 years, and explicitly rejected extension as being a bad deal for the UK in cultural and economic terms. The Government, led by the Treasury which was then headed by Gordon Brown, clearly supported this view.
What then occasions a sudden volte-face two years later and only a few weeks after statements from the Department for Innovation, Universities and Skills (DIUS) indicating support for the original decision? We are not aware of any new evidence that has come to light, and the only independent study available since then, that of Professor Hugenholtz at the University of Amsterdam, has also been highly critical of extension.
There has been some talk of ‘moral arguments’ for extension but it is hard to discern a compelling ‘moral’ case for a proposal whose prime effect is to benefit major label shareholders and a few, already highly successful, artists while imposing significantly greater costs on new creators, the general listening public and the custodians of our cultural heritage.
As Gowers concluded, and the Government has until now consistently reaffirmed, policy-making in this area should be evidence-based and designed to promote the broader welfare of society as a whole. Policies that appear to reflect nothing more than lobbying will only perpetuate the “marked lack of public legitimacy” which the Gowers report lamented — and discourage those who wish to contribute constructively to future Government policy-making in these areas. We therefore call on the Government to present any evidence that has led to this change of policy.
Yours Sincerely,
Professor Lionel Bently, and Dr Rufus Pollock, Centre for Intellectual Property and Information Law, University of Cambridge
Professor Martin Kretschmer, and Professor Ruth Towse, Centre for Intellectual Property Policy & Management, Bournemouth University
Professor Nicholas Cook, AHRC Research Centre for the History and Analysis of Recorded Music, Royal Holloway, University of London
Professor P.A. David, Emeritus Professor of Economics and Economic History, University of Oxford
Professor Graeme Dinwoodie, Chair in Intellectual Property Law, Queen Mary College, University of London
Professor Johanna Gibson, Director Queen Mary Intellectual Property Research Institute, Queen Mary College, University of London
Professor John Kay, Chair, British Academy Copyright Review
Professor Paul Klemperer, Edgeworth Professor of Economics, University of Oxford
Professor Hector MacQueen, and Professor Charlotte Waelde, SCRIPT/AHRC Centre Intellectual Property & Technology Law, University of Edinburgh
Professor David M Newbery, Professor of Economics, University of Cambridge
Dr Mark Percival, Queen Margaret University, Edinburgh, Chair, International Association for the Study of Popular Music (UK/IRL)
Dr Martin Cloonan, Senior Lecturer, University of Glasgow, ex-Chair, International Association for the Study of Popular Music (UK/IRL)
Professor Danny Quah, Professor of Economics, London School of Economics
Professor David Vaver, former Reuters Professor of IP and IP Law and Director of the Intellectual Property Research Centre, University of Oxford
Richard Chesser, Chair, Trade and Copyright Committee, International Association of Music Librarians (UK/IRL)
2009 Open Knowledge Conference (OKCon) This Saturday
March 23rd, 2009
The Open Knowledge Foundation’s 2009 Open Knowledge Conference (OKCon), which I help organize, will take place next Saturday 28th March – less than a week away.
Full details including programme can be found either in this blog post or on the OKCon home page.
As usual this will be a fun and informal day so if you’re free this Saturday and interested in “Open” stuff come along to UCL and take part.
I should also add that for the two days before (Thursday + Friday) there is also the 5th COMMUNIA Workshop which is about Accessing, Using, Reusing Public Sector Content and Data which is being co-organized by the Open Knowledge Foundation together with the London School of Economics and taking place at LSE (all thanks to the tireless work of Jonathan Gray and Prodromos Tsiavos!).
Computing Copyright (or Public Domain) Status of Cultural Works
March 12th, 2009
Background
I’m working on a EU funded project to look at the size and value of the Public Domain. This involves getting large datasets about cultural material and trying to answer questions like: How many of these items are in the public domain? What’s the difference in price and availability of public domain versus non public domain items?
I’ve also been involved for several years in Public Domain Works, a project to create a database of works which were in the public domain (especially recordings).
The Problem
Suppose we have data on cultural items such as books and recordings. For a given item we wish to:
- Identify the underlying work(s) that item contains.
- Identify the copyright status of that work, in particular whether it is Public Domain (PD)
Putting 1 and 2 together allows us to assign a ‘copyright status’ to a given item.
Aside: We have to be a bit careful here since the copyright status of an item and its work may not be exactly the same: for example, even books containing pure public domain texts may have copyright in their typesetting — or there may be additional non-PD material such as an introduction or commentaries (though, in this case, at least theoretically, we should say the item contains 2 works a) the original PD text b) the non-PD introduction).
Note our terminology here (based off FRBR): by an ‘item’ we mean something like a publication be that book, recording or whatever. By a work we mean the underlying material (text, sounds etc) contained within that. So for example, Shakespeare’s play “Hamlet” is a single work but there are many associated items (publications). (Note that we would count a translation of a work as a new work — though one derived from the original work).
Almost all the data available on cultural material is about items. For example, library catalogues list items, databases listing sales (such as Nielsen) list items and online sites providing information on currently available material (along with prices) such as booksinprint, muze or even Amazon list items.
Determining Copyright (or Public Domain) Status
With our terminology in place determining copyright status is, in theory, simple:
- Given information on an item match it to a work (or works).
- For each work obtain relevant information such as date work first published (as an item) and death dates of author(s)
- Compute copyright status based on the copyright laws for your jurisdiction.
While copyright law is not always simple, step three is generally fairly straightforward, especially if one is willing to accept something that almost but not quite 100% accurate (say 99.99% accurate).[^peterpan]
[^peterpan]: Not being 100% accurate means we can ignore some of the “special cases” and one-off exceptions in copyright law. For example, in the UK the Copyright Designs and Patents Act para 301 contains a special provision which mean that “Peter Pan” by J.M. Barrie will never enter the Public Domain (royalties will be payable in perpetuity for the benefit of Great Ormond Street Hospital).
What is not so straightforward are the first two steps especially step 1. This is because most datasets give only a limited amount of information on the items they contain.
Frequently information on authors will be limited or non-existent, and they certainly may not be unambiguously identified (this is especially true of datasets containing ‘commercial’ information such as prices and availability). Often the exact form of the title, even for the same item will vary between datasets and that leaves aside the possibility of varying titles for different titles related to the same work (is it “Hamlet” or “William Shakespeare’s Hamlet” or “Hamlet by William Shakespeare” or “Hamlet, Prince of Denmark” etc).
At the same time, speed matters because the size of the datasets involved are fairly substantial. For example, there were approx 64 thousand titles that sold more than 5 copies in 2007 in the UK. If computing public domain status for each title takes 1 second then a full run will take 18 hours. If it takes 30s per title it will take 22 days.
Some Examples
To illustrate the difficulties here I present the results of two different attempts at computing the PD status for the list of 64k titles which sold at least 5 copies in the UK in 2007.
Example 1: Open Library
I ran this algorithm (by_work method) against the Open Library database via their web api. This was a very slow process. First, because web apis are relatively slow and second because, perhaps due to overloading, the OL API would stop responding at some point and a manual reboot would be required (to try avoid overloading the API we’d already added a significant delay between requests — another reason the process was quite slow). Overall it took more around 10 days to run through the whole 64k item dataset. The results were as follows:
Total PD: 2206.0
Total Items: 63937
Fraction PD: 0.0345027136087
Total Matched: 0.588469900058
As this shows matching was not that successful with only around 3/5 of items successfully matched. Part of this may be due to the fact that:
- I limit the number of title matches to 10 in order to keep the time within reasonable bounds
- The difficulty of allowing enough, but not too much, fuzziness in the matching process.
Overall, approximately 3.5% of all items were identified as PD (that being 5.8% of those actually matched). The PD determination algorithm was a conservative one with an item labelled as PD only if all authors were positively identified as PD.
Thus, this is likely to be lower bounds (at least assuming the match process was reasonable — and allowing for the fact that some PD items included non-PD material such as commentaries). It was certainly clear from basic eyeballing that a substantial number of PD works were either not matched or not computed as PD (because of incorrect authors or missing death dates).
Example 2
Our second algorithm ran against a local copy of Philip Harper’s NGCOBA database (data, code). The algorithm was as follows:
- Matched by title and authors.
- If match: compute PD status strictly (all death dates known and all less than 1937)
- Else: continue
- Pick first author and find all (approx) matching authors (allow extra first names)
- If no match: Not PD
- Intialize PD score to 0
- For each matched author alter score in following manner:
- If author PD: +1
- If not PD: -3
- If unknown (no death_date) -0.5
- PD if score > 0 (Else: Not PD)
This algorithm took a few hours to run (this could likely be much improved with a bit of DB optimization and a move from sqlite to something better). The results were:
Total PD: 6404.0
Total Items: 63917
Fraction PD: 0.100192437067
As can be seen the fraction PD here was substantially higher at around 10%. One might be concerned that this was due to our more lenient PD algorithm (the problem was that without such ‘leniency’ a very large number of PD works/authors were being misclassified as not PD). However, basic eye-balling indicates that the number of false positives is not particularly high (and that there are also some false negatives).
Summary
- Computing PD status is non-trivial largely because a) it is hard to match a given item to a work or person b) we lack data such as authorial death dates and dates of first publication that are required.
- As such we need to adopt approximate and probabilistic methods (such as the scoring approach)
- (Very) preliminary calculations suggest that between 3 and 10% of titles actively sold at any one time are public domain
- NB: this does not mean 3-10% of sales were public domain (in fact this is very unlikely since few, if any of the best-selling items are PD)
Of Mice and Academics: Examining the Effect of Openness on Innovation
February 23rd, 2009
Just came across an interesting working paper put out last Autumn that is relevant to the openness and innovation debate. Entitled: Of Mice and Academics: Examining the Effect of Openness on Innovation and authored by Fiona Murray, Philippe Aghion, Mathias Dewatripont, Julian Kolev and Scott Stern, it is an attempt to bring some empirical evidence to bear in an area that so far has seen little.
It uses a natural experiment in the late 1990s when there was a significant reduction in patent restrictions (increase in openness) related to use of genetically engineered mice. Similar to an earlier paper of Stern and Murray’s the paper estimates the impact on science by exploiting the linkage between certain papers and particular genetically engineered mice (both those affected by increase in openness and those that were not). The overall conclusion is that increased openness does have a significant positive impact. (Which does something to bear out the suggestions of existing theoretical work such as Bessen and Maskin’s on Sequential Innovation and my paper on Cumulative Innovation and Experimentation — which explicitly discusses impacts of IP on scientific experimentation).
For full summary see the abstract inlined below (emphasis added):
Scientific freedom and openness are hallmarks of academia: relative to their counterparts in industry, academics maintain discretion over their research agenda and allow others to build on their discoveries. This paper examines the relationship between openness and freedom, building on recent models emphasizing that, from an economic perspective, freedom is the granting of control rights to researchers. Within this framework, openness of upstream research does not simply encourage higher levels of downstream exploitation. It also raises the incentives for additional upstream research by encouraging the establishment of entirely new research directions. In other words, within academia, restrictions on scientific openness (such as those created by formal intellectual property (IP)) may limit the diversity and experimentation of basic research itself. We test this hypothesis by examining a “natural experiment” in openness within the academic community: NIH agreements during the late 1990s that circumscribed IP restrictions for academics regarding certain genetically engineered mice. Using a sample of engineered mice that are linked to specific scientific papers (some affected by the NIH agreements and some not), we implement a differences-in-differences estimator to evalu- ate how the level and type of follow-on research using these mice changes after the NIH-induced increase in openness. We find a significant increase in the level of follow-on research. Moreover, this increase is driven by a substantial increase in the rate of exploration of more diverse research paths. Overall, our findings highlight a neglected cost of IP: reductions in the diversity of experimentation that follows from a single idea.
Dutch Study on Filesharing
January 23rd, 2009
A new Dutch study on the effects of filesharing has just come out. Unfortunately it is all in dutch! However, courtesy of online translation, it appears the basic message is that filesharing has a net positive impact on welfare (though they term this the ‘economic’ impact):
File sharing has net positive economic impact
The net economic effects of file sharing on the Dutch welfare in the short and long term are positive. As a result of file sharing consumers access a wide range of cultural products. In contrast, a decrease in turnover from the sale of sound recordings, DVDs, and games as a result is plausible.
This is reflected in joint research of SEO Economic Research, the Institute for Information Law (IViR) and TNO to the economic and cultural impact of file sharing for music, movies and games on behalf of the Ministries of Education, Ministry of Economic Affairs and Justice. The analysis is carried out according to a study of statistics and scientific literature, interviews with fervent downloaders, a representative survey among the population and a number of informational workshops in the sector. [translation via Google with some copyediting]
Not being able to read the main study I’m not able to offer any evaluation of its merits or salient points (such as how they trade-off welfare gains from greater access again any costs in lost production — if any).
Update: I’ve been pointed to the English version (thanks Tobias!):
http://www.ivir.nl/publicaties/vaneijk/Ups_And_Downs_authorised_translation.pdf
More comments to come.
