Extracting Feature Geometries from OpenStreetMap

I’ve recently been extracting some river geometries for major cities around the world. The data needs to be a list of latitude/longitude coordinates, representing the nodes on the shape for the river concerned.

I’m sure there’s easier ways to do this, but here’s my technique, shown here for Minneapolis. Click the images for larger version.

1. Extract the data from OpenStreetMap. Use the Export function, and draw out the area concerned with a bounding box. Choose OpenStreetMap XML as the format. I originally tried SVG, but this presents you with screen coordinates instead of latitude/longitude pairs.

2. Open the resulting file in Quantum GIS (QGIS). I used QGIS 1.9. You need the OpenStreetMap plugin installed, this will allow the OSM file that was created in Step 1 to be read straight in (in fact you could download the file directly from the OSM servers, if you wanted to).

3. Select the feature you are interested in. My river (actually a waterbank polygon) is a “hairy feature” as it extends well beyond the extent of the data that was downloaded. Make sure you are selecting it (feature turns yellow) rather than highlighting it for feature information (feature turns red). Otherwise, the subsequent file is, rather unhelpfully, blank.

4. Do Layer > Save Selection as Vector File. Choose “KML” as the format. You probably don’t need to change the coordinate reference system (CRS) as the data will already be in WGS 84, and this (“normal GPS-style latitude/longitude) is the CRS you want.

5. Edit the resulting file, removing the XML tags, and header/footer, and replace spaces with return characters, to leave a long list of latitude/longitudes, ready for importing into your visualisation code.

Rank Clocks and Maps: Spatiotemporal Visualisation of Ordered Datasets

Rank Clocks are a type of visualisation invented by Prof Michael Batty here at UCL CASA. They are time-based line charts, wrapped around a clockface – with the start date at the top, wrapping around clockwise to the end date. The lines on the clock show the change in ranking of the items being visualised. By effectively wrapping a line chart around itself, certain patterns, that would be otherwise hard to spot, become clearer.

Starting from Prof Batty’s Rank Clocks application (written in VB), I created a web version that has a subset of the application’s features, but also includes a map, allowing both temporal rank changes, and location, to be shown. A future enhancement would also be to show the change in location with time as well (an example would be how football clubs have moved around in London over the years and how their relative rank in the leagues has also varied) but for now each item in the dataset has just a single point location that remains constant with time.

Live Rank Clock site here.

The “classic” Rank Clock is of New York skyscrapers – looking at the clock allows bursts of skyscraper development to be easily spotted, and as New Yorkers have been building skyscrapers for over a hundred years, and have many of them, it is a rich dataset. I have curated a London equivalent from various sources including Wikipedia. It includes the many residential towerblocks of the 1960s/1970s, many now knocked down, but is not quite the same as New York’s.

The website is written in Javascript, using OpenLayers both for the map (with OpenStreetMap background) and for the rank clock itself. For the rank clock, I am doing some basic trigonmetry to calculate the coordinates needed to show the lines and converting from polar coordinates to “native” screen coordinates. This is a novel but not particularly efficient use of OpenLayers, but I used it as I am quite familiar with using OpenLayers, particularly for showing lines as vectors, rather than using a Javascript vector-based charting API which would be the more obvious choice.

My interpretation of the Rank Clock concept has plenty of flaws – in particular, data can often be easily obscured, and spotting patterns in noisy (frequently changing rank) data is difficult. It’s difficult even to select lines (to see their caption) if other lines are nearby and overlaying them. Nonetheless, it can provide an unusual way of looking at some interesting datasets.

For one of the datasets in the sample website (US baby names) I have repurposed the map to effectively show a 2D graph indicating beginning and ending (in time) positions of the names – so here OpenLayers is being used to show two “maps” – but neither are actually maps.

I’ve also linked into the Google Earth browser plugin (installation maybe be required), replacing each dot on the OpenStreetMap map, with a column of varying height (and colour) based on the initial rank, with an extent appropriate to the data set. Google Earth can be refreshed by supplying new KML information – and it turns out that OpenLayers has a rather nice KML conversion and export feature for any geometry in it, which allows Google Earth to be driven in this way. This is done when clicking on a Rank Clock line, allowing the equivalent feature in Google Earth to be redrawn with a thicker border. Unfortuantely events cannot be captured from Google Earth and back into the OpenLayers map, so clicking on a pillar in the former will not highlight the corresponding Rank Clock line in the latter. Still, it’s a nice way of linking spatialtemporal information and then visualising it in 3D.

I carried this work out quite a while ago, but haven’t mentioned it to now, as it’s not complete. There are only a limited number of datasets available, and plenty more features could be added – and the navigation and interaction improved significantly. Please bear this in mind when viewing the live site.

There are a few “toy” features already though – you can invert the rank clock (normally the top-ranked items are in the middle of the circle and so are hard to see), change the metric the colour is showing, and filter and relayer.

The three rank clocks shown here are showing: TOP – Changes in population of the London Boroughs of Newham and Tower Hamlets, and the City of London, over 150 years. The City of London line spirals outwards, showing its drop in population (and so rank). Tower Hamlets also shows a big drop in rank during WWII, but has started to increase again recently. Westminster’s population rank has steadily increased, until WWII – but again its rank has also more recently increased. MIDDLE – Tall buildings in London, coloured by year they were built. The oldest (red) buildings have been selected and show in Google Earth, showing that such buildings were entirely in the centre and west of London. BOTTOM – US company revenue. The San-Francisco-headquartered companies are selected on the map and correspondingly highlighted on the rank clock, showing that only one was founded before the 1970s – IBM – and a general spiralling inwards as Silicon Valley grows.

Live Rank Clock site here.

Reworking Booth: Geodemographics of Housing

[Update January 2013 - Scottish SIMD 2012 map added, more details.]

I’ve created a new visualisation, a dasymetric map of housing demographics which you can see here, which attempts to improve on the common thematic (a.k.a. choropleth) maps – a traditional example is shown below – where areas across the country are colour-coded according to some attribute. My visualisation clips the colour-coding to the building outlines in each area, leaving open ground, parks etc uncoloured.

The Traditional Approach

The shortcoming of choropleth maps is that each area is coloured uniformly. If the attribute being measured is a property of the houses in that area, such as much of the census data, then choropleth maps not only colour the houses in each area, but also the parks, rivers and mountains that might also be contained within the area, even though the data being displayed arguably only applies to the houses. This means that geodemographic classification results that predominate in rural areas tend to overwhelm a map at smaller scales – as can be seen in the map on the right – where the green represents a countryside geodemographic.

An alternative to choropleth maps is to use cartograms. These distort the area, elastically, to tessellating hexagonal groups or to circles (Dorling cartograms), to match typically population rather than geographic extent, so that the colours are represented more fairly, but cartograms are very difficult for most people to interpret and relate to familiar physical features. They can look very “alien”. One further alternative is dot distribution maps – these assign dots of colour, randomly within each area. This reduces the colour density correctly in sparsely populated areas, but distributes the dots evenly across empty parks and rows of houses, if both are in a single area, and imply single points of population.

Clipping the Choropleth Maps

My visualisation attempts be the best of both worlds, by retaining the familiar geographic shape of the UK and its towns and cities, but not swamping the map with colours in all areas, and indeed ensuring that unpopulated areas have no colour. This is possible because Ordnance Survey Open Data includes Vector Map District. The second release of this dataset improved the quality of building outlines considerably, allowing distinct rows of buildings on streets to be seen and even individual detached houses. Unfortunately building classifications are not included, so the process necessarily colours all buildings, rather than just the residential ones that formed part of the census data. This is why, for example, the Millennium Dome in Greenwich appears, even though no one (hopefully!) lives there.

The major shortcoming of doing this is that it falsely implies a higher level of precision within each Output Area, by often showing and colouring individual buildings, whereas the colour is representative as an average of the properties in the area concerned, rather than telling you something about that particular building itself. That is, the technique is showing no new or more detailed data than can be seen in the traditional choropleth maps, but tends to mislead the viewer otherwise. This is balanced by making the map seem more realistic, by not unformly covering everything in the area with a giant blob of a single colour.

The map can be considered to be a dasymetric map, albeit one where the spatial qualifier, population density, is one of two values – high (in a building) or zero (not in a building).

Booth’s Poverty Map

An inspiration for this kind of map is the Charles Booth Poverty Map of 1898-9, although my example is considerably less sophisticated. For this map, Booth (and his assistants) visited every house, to determine the demographic of the house, and then painstakingly coloured in the houses, along the streets. His map therefore did not suffer from the falsely implied accuracy – his map really was as accurate as it looks. The Museum of London, incidentally, has a “walk in” Booth poverty map, I featured it on Mapping London blog last year.

The photo above compares Booth’s map (from a photo of the map in the Museum exhibition, including a friend’s hand) with my map, for the Hackney area in London.

OAC, IMD and London

My main geodemographic map is showing the OAC (Output Area Classification), which was created by Dan Vickers in Sheffield in 2005, and is based on data from the 2001 census. The areas used are Output Areas, there are around 210,000 of them in the UK, each one with a population of roughly 250 people in 2001.

The OAC map is not particularly illuminating for London – the capital is considerably more ethnically diverse than most other parts of the country, but because the clustering process used to create OAC is run across the whole country uniformly, only one Supergroup appears to show such ethnically diverse areas – “7″ (Multicultural), rather than showing the variety within this group that extends across the capital. With this in mind I have created an alternative map, which colours the housing according to the IMD (Index of Multiple Deprivation) rankings. This covers England only, and the data is only available at larger spatial units, called LSOAs (Lower Super Output Areas) but is more up-to-date, being from 2010, and shows considerable more variety across London. Use the link at the bottom of the visualisation to switch between the two.

You can view the map here. It uses geolocation to attempt to zoom to your local area, if you allow it to – it will probably ask you to allow this when you visit the site.