Flexible Dates in Python (including BC)

JUNE 18, 2009

I’ve had occasion recently to frequently work with “dates” that come in a lot of shapes and sizes including:

  • Dates in distant past and future including BC/BCE dates
  • Dates in a wild variety of formats: Jan 1890, January 1890, 1st Dec 1890, Spring 1890 etc
  • Dates of varying precision: e.g. 1890, 1890-01 (i.e. Jan 1890), 1890-01-02
  • Imprecise dates: c1890, 1890?, fl 1890 etc

Unfortunately existing support for these in python is fairly weak. I therefore authored a python FlexiDate module (now part of datautil part of a new swiss (army knife) package) which is focused on supporting:

  1. Dates outside of Python (or DB) supported period (esp. dates < 0 AD)
  2. Imprecise dates (c.1860, 18??, fl. 1534, etc)
  3. Normalization of these dates to machine processable versions especially: * ISO 8601 * Dates sortable in the database (in correct date order)

Background

Things we would like:

  1. Dates outside of Python (or DB) supported period (esp. dates < 0 AD)
  2. Imprecise dates (c.1860, 18??, fl. 1534, etc)
  3. Normalization of dates to machine processable versions
  4. Sortable in the database (in correct date order)
  5. Human readability as dates will be re-edited/viewed by people

Not all of these requirements are satisfiable at once in a simple way.

Be clear about what we want:

  1. Storage (and preservation) of “user” dates (both normal and non-normal)
  2. Normalization of dates (e.g. to ~ ISO 8601)
  3. Integration with database (sortability and serializability)

Solution for 1: Represent dates as strings.

Solution for 2: Have a parser (via an intermediate FlexiDate object).

Solution for 3: convert to a float.

Remark: no string based date format will sort dates correctly based on std string ordering (PF: let x,y be +ve dates and X,Y their string representations then if X < Y => -X < -Y (wrong!))

Thus we need to add some other field if we wish dates to be correctly sorted (or not worry about sorting of -ve dates …)

  1. For any given date attribute have 2 actual fields:
  • user version – the version edited by users
  • normalized/parsed version – a version that is usable by machines
  1. Store both versions in a single field but with some form of serialization.

  2. Convert dates to long ints (unlimited in precision) and put this in a separate field and use that for sorting.

Comments

Initially thought that we should parse before saving into a FlexiDate format but: a) why bother b) when parsing always hard not to be lossy (in particular when converting to iso8601 using e.g. dateutil very difficult to not add info e.g. parsing 1860 can easily give us 1860-01-01 …).

References and Existing Libraries