1. Uncodified knowledge cannot be transferred except by f2f interaction (apprenticeship etc)
  2. But knowledge codification is very time and space consuming (and much still remains implicit)
  3. As the amount of codified knowledge grows it becomes harder to find what you want

Hypothesis: Value of Information in Databank = Value of Information if it could be Accessed Perfectly x Ease of Finding Any Particular Item

Plausible to assume Ease of Finding Information = h(Amount) where h’ less than 0

  • Let Amount = n
  • In standard Computer Science if we could sort items in some manner (by which we could also search). h(N) = log(n) (and sorting costs are n log(n) – bubble sort)
  • Suppose only option is brute comparison (and it is useful to find a negative i.e. that what you want isn’t in there). Then this suggests E(search time) = n/2 and h(n) = 2/n

Plausible to have diminishing return for Value of Information if it could be Accessed Perfectly = f(Amount). So f” less than 0. Thus f grows at less than linear rate (eventually …). * If h has form suggested i.e. 2/n then we would have eventually Value of Information Bank is /decreasing/ in amount of information in databank

Example: explaining how to use a computer …

Info on Size of Databanks

  1. How Much Information? Varian and Lyman, http://www.press.umich.edu/jep/06-02/lyman.html
  2. Ithiel De Sola Pool. Communications Flows: A Census in the United States and Japan. Elsevier Science, New York, 1984

The Nature of Information

February 14th, 2005

Coining an aphorism: We are moving towards a world in which all information is software and all software is information

Plan

  1. We process information linearly. This is a fundamental fact. (Aside: example of polyphonic music and the Glenn Gould radio program). Symbol processing in home sapiens is serial and cannot manage either parallel or non-linear presentation. Particularly textual symbol processing. This is not only related to the methods by which humans obtain sensory input but derives from the very structure or high level information processing in the brain. This is manifested very clearly in language.
  2. thus even where information is presented non-linearly, or more commonly in parallel, we still create our own linear thread as we progress through it. A concrete example is given by the internet or by encylcopedias. Though both examples present a web of information rather than an explicit linear narrative the human mind cannot branch multiply in any literal sense. Thus as I progress through a website or an encylcopedia though I may branch I then leave the original line of investigation – perhaps to return later.
  3. Given this fact that we can only read along one dimension at once we see the great challenge or all analytical writing, namely to present in single-dimensional linear form, that which is always multidimensional and non-linear.
  4. Thus we are presented with a dilemma. Much knowledge and information is multi-faceted, approachable from many different angles simultaneously, yet if it is to be understood and processed by humans it must be presented serially, that is to say linearly along a single path. Now I do not suggest that we can overcome these inherent limitations but I do suggest that we can approach knowledge storage and categorization in such a way as to impose the minimal limits on the possible methods of presentation.

The Metaphor

We can imagine the building blocks, the factlets, as pearls, little pearls of knowledge. We can then imagine the creation of an expository line, or narrative if we allow ourselves to abuse terminology, as the stringing of these pearls onto the thread – the thread of narrative – which when complete provides a ‘necklace’ of exposition (NB: though we should avoid seeing any cyclical structure in analogy with the circular necklace as it is more usual for a exposition to resemble an interval with a beginning and end and a direction of progression).

Other Items

The multiple classification problem. Analogies and examples:

  1. no canonical basis vectors for a finite-dimensional vector space.
  2. The borges story cited by foucault on the chinese emperor’s encyclopedia

The Art of Writing History

That most history writing, even of the analytical variety, consists of linear exposition. I often describe this as a narrative but this is dangerous as narrative usually denotes a very specific form of linear exposition.

An Example

The example we shall examine is the hundred years war (This is, of course, a subject eminently suited to a narrative historiographical approach). The Hundred Years war describes the century long struggle between the English and French crown for control of France and various of its subdomains. From the very beginning of historiographical interest in these events (e.g. Froissart) the approach taken has been a narrative one. The most recent work in this tradition is the multivolume work by Jonathan Sumption. He encounters a classic problem. How is one to shoe-horn this struggle into the linear strait-jacket of the printed page. For not only do we have the obvious approach given by time’s arrow (which is the backbone of traditional ‘narrative’ in history) but also the thematic structure given by the geographic dispersion of the conflict.

A simple method for visualizing these situations is given by reducing this problem to two dimensions with time on one axis and all other themata being put along the other axis:

(themata) English throne French throne Charles the Bad King of Navarre Major Battles ….
 Time
  ||
  ||
  \/
…. …. …. ….

Further work: detailed examination of chapters in vol. 1 of Jonathan Sumption’s History of the Hundred Years War

Knowledge, Information and Data

January 13th, 2005

Introduction

I propose the following hierarchy: data — information — knowledge. Where items in one category are refined and filtered in the process of going to the next.

Short Quotes

Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth. Truth is not beauty. Beauty is not love. Love is not music. Music is the best.

Source: Frank Zappa, Album: Joe’s Garage, Track: Packard Goose

Where is the wisdom we have lost in knowledge?
Where is the knowledge we have in information?

Source: T.S. Eliot Choruses from The Rock

Information and Knowledge

Information may be defined as data relating to states of the world and the state-contingent consequences that follow from events in the world that are either naturally or socially caused. The total set of data is closed in that there is a closed set of states and consequences.

….

…. Following the philosopher Dretske (1982), the tight coupling [between information and knowledge] may be expressed in the following way: information is a commodity that is capable of yielding knowledge; and knowledge is identified with information-produced (or sustained) belief. As this formulation makes clear, the line of causation is from information to knowledge. Knowledge is processed information. [emphasis added]

Fransman, Martin in dosi_ea_1998, p. 148-149

Taxonomy Software

December 26th, 2004

Is there a standard data format for taxonomies/classification systems. Should include a specification of text encoding (like LDIF but for taxonomies). If there is I would guess there will be open source implementations (and if not won’t be that hard to write one’s own).

Requirements:

  1. Type of taxonmy:
    1. Enumerations (flat)
    2. Tree (single parent)
    3. Lattice (multiple parent)
  2. Identifiers. Support for at least 10 million possible elements in taxonomy. Optional: Identifiers should be portable across systems (i.e. you can plug different taxonomies together without recoding identifiers). This means probably want a GUID based id system). Required: basic int32 or int64 based identifiers.

Found So Far

  1. DELTA http://biodiversity.uno.edu/delta/. Seem to be primarily for standard tree taxonomies for animals and plants.

Written Myself

Two taxonomy systems with gui editors and serialization to xml. One in C# and the other in java. Major issue is non-stdness.

Wild Ideas

  1. Drupal has a pretty nice web-based gui for creating (and using) taxonomies. Could use that as a front end and then serialize to std text format from the drupal backend db