FLOSS 2008 Workshop on Free/Open Source Software
June 30th, 2008
Last week I attended FLOSS 2008, the second international workshop/network meeting on FLOSS (Free/Libre/Open Source software) in Rennes, France. I was presenting my paper Innovation and Imitation with and without Intellectual Property Rights (and would have offered discussant comments but the author of the paper I was scheduled to discuss had to pull out at the last minute). In addition to this I got to hear a variety of interesting talks. On some of these I was able to take notes which I have included below for the ‘delectation’ of anyone else who is interested.
Mikko Valimaki: IPR and Open Source Software
- Goodman and Myers (2005) — the 3G standard.
- Leveque and Meniere 2007: what does RAND mean
- reasonable royalty is R = c (v1-v2)p where c is incremental costs of licensing, v1-v2 is gain from using this pattern over second-best.
- Other questions for royalty-setting
- quality of volume of patents
- early or late innovators
- cumulative royalties or one-time fees
- But all models he knows of have non-zero royalty fees
- [ed]: not surprising given that you will always get interior solutions
- Windows/Samba discussion
- specific sets of terms
- provide RF for the open source community
- Commission Decision para 783
- “On balance, the possible negative impact of an order to supply on Microsoft’s incentives to innovate is outweighed by its positive impact on the level of innovation of the whole industry.”
- Nokia to acquire Symbian:
- “a full platform will be available … under a royalty-free license … from the Foundation’s first day of operations … the Foundation will make selected components available as open source at launch.”
- [ed]: Motivation here is clear: Nokia care about the hardware and for them software is a complementary good — which they therefore wish to be as cheap as possible. But this raises question as to what is being made open: is hardware patents or pure software patents (and if so how big a deal is this)
Stefan Koch: Efficiency of FLOSS Production
- Question of efficiency of open source development
- How much software did we get for our effort
- Is OS a waste of resources?
- Discussion without much empirical basis
- Claim: fast and cheap, high quality, finding bugs late is inefficient (actually large effort) — see IEEE Software 1999
- Completely unknown as no-one keeps time-sheets. So
- Effort based on participation data
- Effort based on product — look at software and ask how much effort would be needed in commercial environment
- Empirical research in open source
- Mainly case studies
- Helpful but need proper large-scale analysis
- Mined software repositories [ed: cf. today FLOSSMatrix, FLOSSMore]
- 8,261 projects
- 7,734,082 commits
- 663M LOCs
- resources and output is skewed: top decile of programmers: 79% of code base, second decile: 11%
- Effort estimation based on actual participation
- active programmer months (define active as committing in a given month)
- high correlation with LOC added in month
- Cumulate this number for each project
- But not equal to a commercial person-month
- How do we scale: use 18.4 h/w taken from stats for committers on Linux kernel
- [ed:] this is the key assumption. The whole point is that FLOSS effort is not observed and they are using a measure of output (committing) and trying to infer actually activity
- Manpower function modelling:
- Norden-Rayleigh model (1960)
- Some set of problems N (unknown but finite)
- Probs are solved independently and randomly (following Poisson)
- This fits ok but has eventual decline in participation which does not occur
- Modify this: in particular to allow introduction of new problems
- Introduce in prop to original no. problems, in prop to current set of problems etc
- Also have different learning rates
- [ed: but isn’t the setup a little different. Really it is a question of success vs. non-success in terms of acquiring users + some kind of bound on amount of participation due either to fission or complexity]
- Product-based estimation
- COCOMO 81 and COCOMO 2
- Results:
- Comparison COCOMO - Norden-Rayleigh
- For COCOMO 81 cannot find parameters favourable enough to explain Norden-Rayleigh curve
- For COCOMO 2 can find parameters but very favourable
- Suggest (roughly) that FLOSS very efficient (but not very rigorous)
- More formal estimation using all models etc
- Norden-Rayleigh significantly below prodcut-based estimates (factor of 8 in mean)
- Interpretation
- FLOSS v. efficient (self-selection for tasks etc)
- Extremely high amount of non-programmer participation (1:7 relation …)
- [ed]: not sure about this generous view. Other explanations
- No quality measurement (also mentioned by Koch)
- OK: lot of code but low quality
- (Related) Many sourceforge projects are incomplete, easy bit at the start
- Later comes a lot of refactoring/writing documentation. This may display significant diminishing returns
- Many FLOSS projects come from what were originally commercial projects. In that case:
- code may have already been written
- conceptual components have been done already
- Trade-off of time vs. productivity
- May be more productive to only work 10h a week but then product might not be ready for 10 years
- No quality measurement (also mentioned by Koch)
- Form discussion
- interesting point: Nokia thinking of moving to more FLOSS in-house because they can’t manage their 5-10k programmers centrally any more
Mickael Vicente: Shift to Competences Model: A Social Network Analysis of Open Source Professional Developers
- Robles 20007
- Statistics on Debian showing increasing corporate involvement
- Social network extraction
- Get repo logs
- Create link between 2 developers if they have committed on the same file (non-directed graph)
- Simplification: the best collaboration of each developer (directed graph) — pick other developer with whom they have committed most files in common
- Longitudinal analysis
- extract clusters
- Correlation with professional career
- CV collected on Internet, personal web page etc (96% collected)
- Interesting data
Nicholas Radtke: What Makes FLOSS Projects Successful: An Agent-Based Model of FLOSS Projects
- Positive Characteristics of FLOSS
- High quality (Low defect count: Chelf 2006)
- Rapid development
- Violates Brooks law (Rossi 2004)
- Risky Business
- for every successful FLOSS project there are dozens of unsuccessful projects
- Corporate IT manager survey (2002)
- 41% mention inability to hold someone responsible for software
- Attempts at Simulating FLOSS
- SimCode (Dalle and David 2004)
- OSsim (Waggstrom et al 2005)
- …
- K-Means stuff
- Simulate across landscape
- Not social network
- Focus on developer decision to join/contribute to projects (Agent-Based Modelling)
- Defining Success and Failure
- Traditional metrics do not work well (on budget?)
- Completion (Crowston et al. 2003)
- Progression through maturity stages (Crowston and Scozzi 2002)
- Number of developers
- Mailing list activity
- Project outdegree, Active developer count (Wang 2007)
- The Model Universe
- Agents and projects
- Agents:
- Consumption: 0-1
- Producer: 0-1
- Resource: 0-1.5 (1=40h)
- Memory: agents only aware of some subset of projects
- Needs vector (preferences)
- utility: linear sum of: similarity match + current popularity (current resources) + cumulative resources + download + f(maturity)
- Projects:
- resources needed
- current resources
- cumulative resources
- download count
- preferences: same as agent but converges towards those had by agents working on it
- Agents choose between projects each time period
- have some randomness in that use multinomial logit: prob choose project i ~ exp(mu * Utility of project i)
- Results
- Simulate over 250 time steps ~ 4 years
- calibrate [ed: in a way I was not quite clear about]
- compare simulation with empirical data from sourceforge
- developers per project
- projects per developer
- Find that (from simulation data) downloads and cumulative resources are not important
Fabio Manenti: Dual Licensing in Open Source Software Markets
- Benefits of Going Open Source
- feedback from community
- network effects (usage)
- competitive pressures (e.g. Netscape) [ed: not sure this is a benefit]
- Dual-licensing
- Kosky (2007): 6% of representative sampl of European OSS business firms employ DL strategies
Alexia Gaudeul: Blogs and the Economics of Reciprocal (In-)Attention
- What blogs are
- Reasons for blogging
- Question: do you befriend (link) because of content produced or do you produce content because of friends
- General points
- Market interactions only part of wider class of reciprocal relations
- Time vs. money economics
- Unique dataset, very detailed and complete, to test networked relations
- Model — but left out due to time
- Dataset: livejournal 2006
- Sociology: teenagers to young adults (15 to 23), female (67\%), Americans (70\%)
- Fast growth: created in 1999, 8M accounts, 1.3M active
- FLOSS but for-profit (SaaS)
- Great part from self-referential
- Lively: 4 comments per post on average
- Federated by communities: no. of communities per person 15
- Journals updated for more than 2 years on avg
- 70\% have posted in last 2 months
- No. of entries: 1 every 2 days
- No. of friends: 50 avg
- Balance between friends and friends of
- Balance between comments received / made
- Friendship patterns
- May be balance but does not explain no. of friends of diff. individuals
- Need to distinguish
- Norm of reciprocity: more promiscuous bloggers accumulate friends
- Content attractiveness
- Quality/freq. of posts
- Interactivity (comments per post)
- Regressions
- Reciprocity: No. blogs read (friend) = b * number of readers (friend of) + error
- Activity: No. readers = cX + error — X = matrix of ind. variables
- Endogeneity issues [ed: all over the place)
- Regress: ln(Friends) = ln(Friend of) + … (with instrumenting Friends Of on Activity so solve endogeneity issues)
- Saturation around 400 friends seemingly (few with more)
- Max no. of friendship when your no. friends = no. friends of (maybe)
- A norm of reciprocity
- Issues with endogeneity of activity (which was used to instrument friends of)
Sylvain Dejean
- Does ICT lead to the Internet lead to a global village or a cyber-balkan
- What leads to emergence of virtual commmunities
- Is the heterogeneity of contributions an impediment to self-organize
- How to manage virtual communities
- Agent-based model:
- Individuals defined by some characteristics
- Herfindahl index measures degree of self-organization [ed: why self-organization]
- Communities change via selection and variation
markdown2latex (mkdn2latex) 1.2
June 23rd, 2008
A new version (v1.2) of my python script for converting markdown to latex is now done. markdown2latex (renamed from mkdn2latex) has been extensively refactored to become a proper python-markdown extension. This means it can be used seemlessly alongside plain markdown conversion, as well as independently whether as a module or, in its classic form, from the command line.
In addition for ease of installation it has also been turned into a proper python package and registered on pypi so you can just do:
$ easy_install markdown2latex
Alternatively you can still get it straight from the repository at:
http://knowledgeforge.net/okftext/svn/trunk/python/markdown2latex/
New Paper: “Is Google the next Microsoft? Competition, Welfare and Regulation in Internet Search”
June 2nd, 2008
One the major things I’ve been working on since last summer (other than the work on Trading Funds) is a paper on search engines such as those provided by firms like Google, Yahoo! etc. The first complete version of this is now ready for public consumption. Entitled Is Google the next Microsoft? Competition, Welfare and Regulation in Internet Search I’ve posted it online at:
http://rufuspollock.org/economics/papers/search_engines.pdf
Abstract
Internet search (or perhaps more accurately ‘web-search’) has grown exponentially over the last decade at an even more rapid rate than the Internet itself. Starting from nothing in the 1990s, today search is a multi-billion dollar business. Search engine providers such as Google and Yahoo! have become household names, and the use of a search engine, like use of the Web, is now a part of everyday life. The rapid growth of online search and its growing centrality to the ecology of the Internet raise a variety of questions for economists to answer. Why is the search engine market so concentrated and will it evolve towards monopoly? What are the implications of this concentration for different `participants’ (consumers, search engines, advertisers)? Does the fact that search engines act as ‘information gatekeepers’, determining, in effect, what can be found on the web, mean that search deserves particularly close attention from policy-makers? This paper supplies empirical and theoretical material with which to examine many of these questions. In particular, we (a) show that the already large levels of concentration are likely to continue (b) identify the consequences, negative and positive, of this outcome (c) discuss the possible regulatory interventions that policy-makers could utilize to address these.
Stackelberg Added to Atlas of Economics Models
May 29th, 2008
I’ve added a reasonably detailed treatment of Stackelberg Competition to the Atlas (of Economic Models).
2008 International Industrial Organization Conference (IIOC)
May 20th, 2008
After attending the IIOC conference last year I was back this weekend at the 2008 IIOC event which took place at Marymount University in Virginia. I presented the latest version of two of my papers: The Control of Porting in Two-Sided Markets and Forever Minus a Day? Theory and Empirics of Optimal Copyright Term.
I also provided discussant comments on Christopher Ellis’s and Wesley Wilson’s paper entitled Cartels, Price-Fixing, and Corporate Leniency Policy:What Doesn’t Kill Us Makes Us Stronger. In addition I include below some very partial notes on some of the sessions I attended — though activity in this regard was rather limited by the fact that, though there were more papers overall than last year (388 in total), sessions were organized into more breadth and less length.
Transaction Costs and Trolls: the Behaviour of Individual Inventors, Small Firms and Entrepreneurs in Patent Litigation (Gwendolyn Ball and Jay Kesan)
- Explore settlements in relation to patents. Questions:
- How often do settlements happen relative to litigation
- Are small firm and entrepreneurs at a major disadvantage in defending their patents
- Or do patent
trolls' use the threaof litigation toextort’ payments- NTP vs. RIM ($612M)
- Saffron vs. Boston Scientific ($412M to individual doctor who had an infringed heart stent patent)
- Does nature of defendant/plaintiff (L/M/S) affect likelihood of settlement
- Existing databases not so great
- Only list trial outcomes not pre-trial outcomes
- Often only list primary plaintiffs
- Fix this and link patent litigation to companies
- Results
- Claimed usually that 95% cases settle
- In fact 8% are resolved at pre-trial (still expensive)
- 4% settled at trial
- so ~ 88% settle
- Troll stuff:
- 97 licensing firms as plaintiffs (none as defendants). These may be classic trolls but they are a small part of overall litigation.
- Evidence shows that entrepreneurs and small inventors are very active (so do not seem particularly disadvantaged) and often sue each other rather than larger firms
- Crudely: small inventors more likely to pursue a case to the end than large litigators
- Claimed usually that 95% cases settle
- Discussant comments:
- Bessen and Meurer find $28M hit on firms facing litigation
- Issues of correlated errors across cases
- My comments:
- probably need to disaggregate across areas — after all no-one has suggested ‘trolling’ is an issue in traditional pharma
- (for me) it would be useful to have an idea how many cases ’settle’ at the ‘letter stage’, that is, before anything even turns up in the court system. After all you only get to the courts (even with preliminaries) if you cannot sort out a license.
Prior Art - To Search or Not to Search (Vidya Atal)
- Alcacer + Gittelman 2006 showed 40% had prior art added by USPTO examiner
- 2/3 citations on an average patent added by USPTO
- Langinier + Marcoul (2003), Lampe (2007) — incentive to disclose prior art
- Issue of bad (non-novel) patents may be because people have poor incentives to search
- Mainly related this to fact that even a bad patent (if it gets past examination) has a +ve payoff
Today I’ll be presenting my paper Forever Minus a Day? Theory and Empirics of Optimal Copyright Term at Stanford in the Social Science and Technology Seminar Series (also here).
This new paper is a heavily revised version of the copyright-term specific portions of my original ‘Forever Minus a Day’ paper (see post from last summer). The rest of the original paper can now be found in Optimal Copyright over Time: Technological Change and the Stock of Works which was published in the December issue of the Review of Economic Research on Copyright Issues (RERCI).
Update post-talk (2008-05-16): the slides are now online at:
http://rufuspollock.org/economics/papers/optimal_copyright_term_talk_stanford.pdf
The Economics of Knowledge: A Review of the Theoretical Literature
April 14th, 2008
Last year I collated and distilled the notes and summaries accumulated over the PhD into a proper paper which could act as the literature review in my dissertation. While I submitted the PhD last August I’ve only just got around to posting this up and it can now be found at:
http://www.rufuspollock.org/economics/papers/economics_of_knowledge_review.pdf
From the abstract:
A selective review of the existing theoretical literature related to the economics of knowledge with particular attention to intellectual property, especially in the form of patents.
Note for those seeking the references they can all be found in the economics bibliography found at:
“Optimal Copyright over Time: Technological Change and the Stock of Works” Published
February 14th, 2008
A refactoring of the first theoretical part of my optimal copyright paper has now been published in the December issue of the Review of Economic Research on Copyright Issues (RERCI) under the title: Optimal Copyright over Time: Technological Change and the Stock of Works. A preprint can be found at:
http://www.rufuspollock.org/economics/papers/optimal_copyright_over_time.pdf
Atlas of Economic Models Launched (in alpha)
December 27th, 2007
Over Christmas I’ve had some spare time. This has permitted me to get the Atlas of Economics Models off the ground. This is a project I’ve been thinking about for some years, first motivated really by the experience of trying to discover what variations had been done on the basic Hotelling-line model of ’spatial’ product differentiation and competition (previous allusion earlier in the Autumn here).
So what is the Atlas supposed to be? From the front page:
The Atlas of Economic Models is a comprehensive list of the basic ‘building-block’ models used by economists. It also includes additional information, for example worked out analytical solutions to special cases and details as to how models inter-relate (hence the ‘Atlas’ in the title). More about the atlas can be found on the about page about page.
Other important features of the Atlas are that it is:
- Community Editable: the Atlas is a community-based project with most content editable by anyone who wishes to contribute. Specifically we’re managing the content in a wiki and to edit any given page all you need to do is click on the edit button at the bottom of that page.
- Openly Licensed: all content is openly licensed. That is all material is made available under a license that permits it to be freely used, reused, shared and redistributed by others. Further details on the license page.
As yet, it obviously does not have much content but that should be gradually remedied over the coming months. And if you’re economically inclined why not head over there and help out …
