FLOSS 2008 Workshop on Free/Open Source Software
June 30th, 2008
Last week I attended FLOSS 2008, the second international workshop/network meeting on FLOSS (Free/Libre/Open Source software) in Rennes, France. I was presenting my paper Innovation and Imitation with and without Intellectual Property Rights (and would have offered discussant comments but the author of the paper I was scheduled to discuss had to pull out at the last minute). In addition to this I got to hear a variety of interesting talks. On some of these I was able to take notes which I have included below for the ‘delectation’ of anyone else who is interested.
Mikko Valimaki: IPR and Open Source Software
- Goodman and Myers (2005) — the 3G standard.
- Leveque and Meniere 2007: what does RAND mean
- reasonable royalty is R = c (v1-v2)p where c is incremental costs of licensing, v1-v2 is gain from using this pattern over second-best.
- Other questions for royalty-setting
- quality of volume of patents
- early or late innovators
- cumulative royalties or one-time fees
- But all models he knows of have non-zero royalty fees
- [ed]: not surprising given that you will always get interior solutions
- Windows/Samba discussion
- specific sets of terms
- provide RF for the open source community
- Commission Decision para 783
- “On balance, the possible negative impact of an order to supply on Microsoft’s incentives to innovate is outweighed by its positive impact on the level of innovation of the whole industry.”
- Nokia to acquire Symbian:
- “a full platform will be available … under a royalty-free license … from the Foundation’s first day of operations … the Foundation will make selected components available as open source at launch.”
- [ed]: Motivation here is clear: Nokia care about the hardware and for them software is a complementary good — which they therefore wish to be as cheap as possible. But this raises question as to what is being made open: is hardware patents or pure software patents (and if so how big a deal is this)
Stefan Koch: Efficiency of FLOSS Production
- Question of efficiency of open source development
- How much software did we get for our effort
- Is OS a waste of resources?
- Discussion without much empirical basis
- Claim: fast and cheap, high quality, finding bugs late is inefficient (actually large effort) — see IEEE Software 1999
- Completely unknown as no-one keeps time-sheets. So
- Effort based on participation data
- Effort based on product — look at software and ask how much effort would be needed in commercial environment
- Empirical research in open source
- Mainly case studies
- Helpful but need proper large-scale analysis
- Mined software repositories [ed: cf. today FLOSSMatrix, FLOSSMore]
- 8,261 projects
- 7,734,082 commits
- 663M LOCs
- resources and output is skewed: top decile of programmers: 79% of code base, second decile: 11%
- Effort estimation based on actual participation
- active programmer months (define active as committing in a given month)
- high correlation with LOC added in month
- Cumulate this number for each project
- But not equal to a commercial person-month
- How do we scale: use 18.4 h/w taken from stats for committers on Linux kernel
- [ed:] this is the key assumption. The whole point is that FLOSS effort is not observed and they are using a measure of output (committing) and trying to infer actually activity
- Manpower function modelling:
- Norden-Rayleigh model (1960)
- Some set of problems N (unknown but finite)
- Probs are solved independently and randomly (following Poisson)
- This fits ok but has eventual decline in participation which does not occur
- Modify this: in particular to allow introduction of new problems
- Introduce in prop to original no. problems, in prop to current set of problems etc
- Also have different learning rates
- [ed: but isn’t the setup a little different. Really it is a question of success vs. non-success in terms of acquiring users + some kind of bound on amount of participation due either to fission or complexity]
- Product-based estimation
- COCOMO 81 and COCOMO 2
- Results:
- Comparison COCOMO - Norden-Rayleigh
- For COCOMO 81 cannot find parameters favourable enough to explain Norden-Rayleigh curve
- For COCOMO 2 can find parameters but very favourable
- Suggest (roughly) that FLOSS very efficient (but not very rigorous)
- More formal estimation using all models etc
- Norden-Rayleigh significantly below prodcut-based estimates (factor of 8 in mean)
- Interpretation
- FLOSS v. efficient (self-selection for tasks etc)
- Extremely high amount of non-programmer participation (1:7 relation …)
- [ed]: not sure about this generous view. Other explanations
- No quality measurement (also mentioned by Koch)
- OK: lot of code but low quality
- (Related) Many sourceforge projects are incomplete, easy bit at the start
- Later comes a lot of refactoring/writing documentation. This may display significant diminishing returns
- Many FLOSS projects come from what were originally commercial projects. In that case:
- code may have already been written
- conceptual components have been done already
- Trade-off of time vs. productivity
- May be more productive to only work 10h a week but then product might not be ready for 10 years
- No quality measurement (also mentioned by Koch)
- Form discussion
- interesting point: Nokia thinking of moving to more FLOSS in-house because they can’t manage their 5-10k programmers centrally any more
Mickael Vicente: Shift to Competences Model: A Social Network Analysis of Open Source Professional Developers
- Robles 20007
- Statistics on Debian showing increasing corporate involvement
- Social network extraction
- Get repo logs
- Create link between 2 developers if they have committed on the same file (non-directed graph)
- Simplification: the best collaboration of each developer (directed graph) — pick other developer with whom they have committed most files in common
- Longitudinal analysis
- extract clusters
- Correlation with professional career
- CV collected on Internet, personal web page etc (96% collected)
- Interesting data
Nicholas Radtke: What Makes FLOSS Projects Successful: An Agent-Based Model of FLOSS Projects
- Positive Characteristics of FLOSS
- High quality (Low defect count: Chelf 2006)
- Rapid development
- Violates Brooks law (Rossi 2004)
- Risky Business
- for every successful FLOSS project there are dozens of unsuccessful projects
- Corporate IT manager survey (2002)
- 41% mention inability to hold someone responsible for software
- Attempts at Simulating FLOSS
- SimCode (Dalle and David 2004)
- OSsim (Waggstrom et al 2005)
- …
- K-Means stuff
- Simulate across landscape
- Not social network
- Focus on developer decision to join/contribute to projects (Agent-Based Modelling)
- Defining Success and Failure
- Traditional metrics do not work well (on budget?)
- Completion (Crowston et al. 2003)
- Progression through maturity stages (Crowston and Scozzi 2002)
- Number of developers
- Mailing list activity
- Project outdegree, Active developer count (Wang 2007)
- The Model Universe
- Agents and projects
- Agents:
- Consumption: 0-1
- Producer: 0-1
- Resource: 0-1.5 (1=40h)
- Memory: agents only aware of some subset of projects
- Needs vector (preferences)
- utility: linear sum of: similarity match + current popularity (current resources) + cumulative resources + download + f(maturity)
- Projects:
- resources needed
- current resources
- cumulative resources
- download count
- preferences: same as agent but converges towards those had by agents working on it
- Agents choose between projects each time period
- have some randomness in that use multinomial logit: prob choose project i ~ exp(mu * Utility of project i)
- Results
- Simulate over 250 time steps ~ 4 years
- calibrate [ed: in a way I was not quite clear about]
- compare simulation with empirical data from sourceforge
- developers per project
- projects per developer
- Find that (from simulation data) downloads and cumulative resources are not important
Fabio Manenti: Dual Licensing in Open Source Software Markets
- Benefits of Going Open Source
- feedback from community
- network effects (usage)
- competitive pressures (e.g. Netscape) [ed: not sure this is a benefit]
- Dual-licensing
- Kosky (2007): 6% of representative sampl of European OSS business firms employ DL strategies
Alexia Gaudeul: Blogs and the Economics of Reciprocal (In-)Attention
- What blogs are
- Reasons for blogging
- Question: do you befriend (link) because of content produced or do you produce content because of friends
- General points
- Market interactions only part of wider class of reciprocal relations
- Time vs. money economics
- Unique dataset, very detailed and complete, to test networked relations
- Model — but left out due to time
- Dataset: livejournal 2006
- Sociology: teenagers to young adults (15 to 23), female (67\%), Americans (70\%)
- Fast growth: created in 1999, 8M accounts, 1.3M active
- FLOSS but for-profit (SaaS)
- Great part from self-referential
- Lively: 4 comments per post on average
- Federated by communities: no. of communities per person 15
- Journals updated for more than 2 years on avg
- 70\% have posted in last 2 months
- No. of entries: 1 every 2 days
- No. of friends: 50 avg
- Balance between friends and friends of
- Balance between comments received / made
- Friendship patterns
- May be balance but does not explain no. of friends of diff. individuals
- Need to distinguish
- Norm of reciprocity: more promiscuous bloggers accumulate friends
- Content attractiveness
- Quality/freq. of posts
- Interactivity (comments per post)
- Regressions
- Reciprocity: No. blogs read (friend) = b * number of readers (friend of) + error
- Activity: No. readers = cX + error — X = matrix of ind. variables
- Endogeneity issues [ed: all over the place)
- Regress: ln(Friends) = ln(Friend of) + … (with instrumenting Friends Of on Activity so solve endogeneity issues)
- Saturation around 400 friends seemingly (few with more)
- Max no. of friendship when your no. friends = no. friends of (maybe)
- A norm of reciprocity
- Issues with endogeneity of activity (which was used to instrument friends of)
Sylvain Dejean
- Does ICT lead to the Internet lead to a global village or a cyber-balkan
- What leads to emergence of virtual commmunities
- Is the heterogeneity of contributions an impediment to self-organize
- How to manage virtual communities
- Agent-based model:
- Individuals defined by some characteristics
- Herfindahl index measures degree of self-organization [ed: why self-organization]
- Communities change via selection and variation
Notes on Theories of Contextual Judgement
April 30th, 2008
Over the last couple of months for the purpose of my research on happiness/subjective-well-being I’ve been putting together some notes on theories of contextual judgement. The first part of these is now in a form suitable for public consumption and I’ve posted them at:
http://www.rufuspollock.org/economics/notes/theories-of-contextual-judgement/
For anyone with an interest in copyright issues, particularly in the online environment, there is an excellent event on today at the LSE organized by Ian Brown of the OII and at which I’ll be speaking (briefly) on the subject of “How can we maximise copyright’s return to society?” More details below.
Musicians, fans and online copyright
Wednesday 19 March 2008 14:00 - 17:00
- John Kennedy, CEO of IFPI
- Paul Sanders, Director of Strategy at Playlouder
- Becky Hogge, Open Rights Group
- Adrian Brazier, DBERR
- Lilian Edwards, Southampton University
- Rufus Pollock, Cambridge University
- Michelle Childs, Knowledge Ecology International
- Wendy Grossman, musician / freelance journalist
Location: Old Theatre, London School of Economics, Houghton Street, London, WC2A 2AE, United Kingdom.
This Wednesday afternoon we have a great selection of speakers for our free OII/LSE event on music and copyright. Come along to find out what the government, music industry, publishers and independent experts are thinking about ideas like 3-strikes-and-you’re-disconnected; scanning ISP traffic for copyright works; and notice and takedown regimes.
Full programme at: http://www.oii.ox.ac.uk/events/details.cfm?id=186
The second (or third depending on how you are counting) Open Knowledge Conference (OKCon) which is organized by the Open Knowledge Foundation and which I help coordinate is on tomorrow at LSE in London.
There are a lot of good sessions and so if you are interested in open knowledge and have Saturday free why not come along.
Speaking at Oxford Geek Night on Open Knowledge and Componentization
February 5th, 2008
Tomorrow I’ll be speaking with Nate Olson at the latest Oxford Geek Night on the subject of Open Knowledge and Componentization. Here’s the blurb:
Componentization on a large scale (such as in the Debian ‘apt’ packaging system) has allowed large software projects to be amazingly productive through their use of a decentralised, collaborative, incremental development process. Componentization works so well because it allows us to ‘divide and conquer’ the organizational and conceptual problems of highly complex systems. Given this, what are the possibilities (and problems) of this approach for knowledge generally? How do we best design “knowledge APIs”, discover and distribute existing resources, and recombine decentralised datasets? In this talk we’ll discuss the answers to (some of) these questions focusing particularly on the role the Comprehensive Knowledge Archive Network can play.
So, if you’re in the Oxford vicinity and interested in Open Knowledge and related matters (there’s a good line-up of other speakers including Denise Wilton of moo.com) why not drop in to the Jericho Tavern around 7.30pm tomorrow evening.
Speaking at Warwick Industrial Organisation Seminar
January 31st, 2008
Courtesy or a kind invitation from Richard Cave, tomorrow I’ll be heading over to Warwick University to present in their IO Seminar. The talk will be focused on my main ‘IP papers’: Cumulative Innovation, Sampling and the Hold-Up Problem and Imitation and Innovation with and without IP, but if there’s time I might also get the chance to discuss another paper of mine on the Control of Porting in Two-Sided Markets.
Talk at Westminster Media Forum 2004-12-09
December 10th, 2004
This is the text of a brief presentation I gave as a member of the panel on Intellectual Property and the Public Space at the Westminster Media Forum 2004-12-09. I was presenting in my capacity as Director of Friends of the Creative Domain
Text
First a quick word about who we are. Friends of the Creative Domain is an open community set up to promote the intellectual and artistic commons in our culture. Given the similarity of names it is worth stating for the record that while we are strong supporters of the Creative Commons project we are not formally associated in any way.
I am here today to talk about IP and the public space. I think we can all agree that the public space is essential to our culture. I think that we also agree that rights in intellectual works, IP, is important in remunerating creators and intermediaries. Unfortunately, however, the two are in tension - IP can often threaten this public space in our culture. For ideas, or creative works are not like normal property.
As Jefferson stated two centuries ago: He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.
This means the public space of ideas is quite different of that for normal property. When we protect works we reduce access, we reduce the ability to light the new taper
, and that impact, that impact on the public space must be acknowledged.
Now what is this ‘public space’? Most simply this public space is a commons, that is a community where the norms of sharing and collaboration predominate. Why are these three aspects important? Sharing because it gives others the freedom to access and reuse a work without the need to seek permission. Collaboration because this allows for greater and easier reuse and remixing in the creation of new works. And community because these norms are shared by the participants in the space.
Now the spectrum of such sharing can be quite broad, from simply allowing non-commerical uses of a work to placing it in its entirety into the public domain. But behind all of these possibilities lie those core principles of fostering sharing and collaboration. What kind of works then might enter this space, whose creators or owners would welcome greater dissemination and reuse, even if it means surrendering some of their rights under traditional copyright?
To take just a few initial examples: advertising works; the many newspapers, magazines, newsletters etc produced by not-for-profit organizations or for not-for-profits purposes; much academic work be it in the sciences or the humanities; all kinds of ‘amateur’ artistic work from music to film; sections of the back catalogue of the BBC which will enter a Creative Archive; … and the list goes on.
Much of our culture is being needlessly locked up. Copyright places large burdens on those who wish to remix, disseminate or access creative works. In some cases this burden may be a necessary part of ensuring the remuneration of creators and owners. But in many other situations these burdens are without benefit. Estimates suggest that a majority of even prime commercial work such as albums produced at the major labels are simply not available commercially. That means the artist is getting no revenue and the public is not getting access to these works. And this is even more crazy when we think of the vast part of our culture that is not produced for commercial ends in the first place.
At present everything you or I produce for whatever purpose is copyrighted by default - not even a copyright symbol is needed. We want to provide the option of a different default. You have just heard about the Creative Commons project. I applaud this project and its provision of tools to support the creative domain, the commons of our culture. But I don’t think it is enough. I think we need to be working even more actively to foster this creative domain, this ‘public space’ in our society, as a valuable complement and alternative to the traditional copyright regime and copyrighted culture.
Thank you.
