Here’s some iPhone photos from my office window at UCL, over the last 12 months.
Category Archives: Uncategorized
Data Graphics and Beer Don’t Mix
Here’s an example of an outstandingly misleading data-graphic, appearing in this week’s LondonStudent freepaper.

It attempts to show the disparity of bar staff pay across London universities, but:
- The “empty” pint glass does not correspond to £0.00. To the casual observer it makes it look like CSSD work for free, until the figures are noticed. In fact, the text of the article mentions that, at two other London university bars, the staff do in fact work for free (or for beer – ouch) – but these are not shown on the graphic.
- The graphic is a 2D (i.e. print) representation of a 3D object (a pint glass, tilted slightly towards the viewer) but the scale appears to vary in 1D – the values form a straight line across the “glass”. Hence the graphic has a large “Lie Factor”, the concept discussed in detail in E. Tufte’s totemic book The Visual Display of Quantitative Information (p57 for those making notes!)
- LSE’s amount bizarrely isn’t represented at all in the graphic, but appears in a text box above it.
- The numbers are on their side, even though there is plenty of room to show them horizontally – making the real values harder to read, so the reader concentrates on the misleading graphic representation instead.
- The actual levels don’t bear any resemblance to the values – the ordering is correct but the relative value differences don’t correspond to the “beer levels”. For example, the drop between £5.90 and £5.95 is larger than the £5.95 to £6.25 drop.
- Why’s beer being used to represent pay anyway?
HEFCE Funding Map
This is the sixth in a series detailing the projects I have worked on at UCL in the last academic year.

This was a quick mashup to show on a map the latest HEFCE funding round – HEFCE is the government body that decides and awards research funding to the universities around the UK.
This is a vector-based mashup, once again using OpenLayers. For each point, representing a university’s “main” campus, I request a pie-chart from the Google Charts API, and use the resulting image as the marker for the point. There’s no simplification or other generalisation, so, for example, you’ll need to zoom in quite far if you want to make out the different universities in London.
It was cobbled together in about a day, the Thematic Mapping blog was particularly useful for getting the images working as markers.
You can see the mashup here.
Spatial Interaction Modelling for Access to Higher Education
This is the first in a series detailing the projects I have worked on at UCL in the last academic year.
My main project through the last year has been to test a hypothesis, developed by Professor AG Wilson, that the flows of students moving from school to university can be approximately by spatial interaction modelling (SIM). Put simply, SIM is a variant of the 300-odd year old Newton’s Law of Universal Gravitation, i.e. the attraction between two masses is related by each of their masses and the distance between them. Replace the masses by the numbers of final-year pupils a school, and a university’s capacity, and make the distance decay exponential instead of inverse-square, and that’s the basics of the model. A similar theory has been applied to great effect by Joel Dearden of CASA, in his retail SIM, which has shown a “tipping point” explaining how supermarkets and out-of-town retail developments have become attractive to shoppers over the last forty years.
Of course, it’s a little more complicated than that, and even with the more complex model I’ve tested, a large number of simplifying assumptions have to be made.
The two main extra parameters that are added to the model are (1) that universities have an “attractiveness factor” above and beyond their size. I have used one of the common university league tables to provide values for this factor. And (2) the distance-decay is not uniform across all types of school students, but varies by their background. By splitting up the final-year school students by demographic, the variation in the distance-decay can be seen, and this is used to calibrate the model.

The seven OAC demographic supergroups are shown here – the horizontal scale is distance and is the same in each graph. (Only English-based school students going to English universities are considered in the study.) The vertical scale is the proportion of students, of that OAC supergroup, in each distance bucket. The actual number of students in each supergroup varies dramatically and this is not shown in the graphs.
The graphs show there is indeed considerable variation between supergroups in the “beta value” of the drop-off if approximated as exponential, and also in the “R-squared” fit to true exponential decay.
- Blue collar.
- City living – this group strongly favours London, Birmingham and Manchester, i.e. the same or other “big cities” in England, hence characteristic peaks appear at these distances – accentuated by the relatively small school-age population in this group.
- Countryside – this group rises before falling, as there is a minimum distance they need to travel to get to even their nearest university.
- Prospering suburbs – the lowest beta-value, in other words this group attaches the least importance to school-university distance.
- Constained by circumstance – similar to the first group.
- Typical traits – the “average” group which encouragingly also has an average looking graph.
- Multi-cultural – more distance-sensitive than the others – hence the very steep drop-off. This shows that people living in areas classified as multi-cultural will more strongly desire going to a university that is very local to their home.
Prof Wilson’s theory also factors in the subject that the student is studying (not all universities offer all subjects, and some are most are strong in certain subjects and weak in others), and their attainment at school (i.e. they might really want to study Maths at Oxford, and be at a school very near by, but if they get a D in Maths at A-Level, they aren’t going to be able to do that.)
Universities also come in two types – “recruiting”, where there are more places than students genuinely intending studying there, and “selective”, where there are more prospective students than places. One interesting effect of the recent economic downturn is the massive increase in people applying for university in 2009-10 – UCL saw a 12% increase for undergraduate courses, for example. This has had the effect of making more universities selective.
In order to consider two types in the same model, it was necessary to develop what is known as a “partially constrained” SIM. The details are for a future article, but, put simply, an iterative approach, assigning students to a university and then reassigning the weakest for over-capacity universities, is taken.
I built a GUI in Java – it’s the language I’m most comfortable with for “proper” programming – to quickly visualise the results and compare them with real-life flows. Here’s a bit of it:

This shows the perhaps not very surprising prediction that BIRM7s (multi-cultural school students living in Birmingham) are pretty likely to also go to university in Birmingham (AST = Aston, BCU = Birmingham City University, BIR = University of Birmingham), rather than elsewhere in the country.
When compared with the actual flows:

…the model under-predicts the flow to Birmingham City University, possibly because BCU’s desirability amongst this demographic group is mis-calibrated. Further-education students are also not present in the predicted model, but are included in the actual flows, so the two are not, as presented, normalised.
The model needs to be developed further before it can be presented formally. In particular, attainment is almost certainly a necessary component.
M:F Ratio as a measure of a City’s Cycling Friendliness
Okansas links to an interesting study in the Scientific American which relates the cycling friendliness of a city to the male-female ratio of the cyclists in it – the theory being that men are more likely to brave a motor-friendly place while women need more encouragement.
I counted 19 men and 17 women on bikes on my commute into work today, although this was after the normal London commuting time, and a significant part of the commute was not on roads. I suspect that there is more of a male bias on the busier roads and during the rush hour.
Quantum GIS 1.3
A new version of Quantum GIS, the free, open-source and user-friendly GIS, has been released today.
See the official blog for all the details, but the most exciting addition for me is the OpenStreetMap integration. Now, you can download data directly from the OSM servers, into the application. OSM-like stylings are applied to the data to make it look a bit more like a map, and you can easily can view all the tags and relations on each object. You can also edit the data directly in QGIS, as if it was normal GIS data, and then save it straight back to the server. This could potentially make it a good alternative to the Potlatch and JOSM editors that are currently used for the bulk of additions to OpenStreetMap. The integration isn’t perfect – I got a server-side bounding box error on my first attempt out downloading data which should have been caught locally – but it’s pretty impressive nonetheless.
With QGIS’s excellent python integration, it should be possible to write other plugins, to, for example, create well-shaped building outlines with perfect right angles. I think you can do this in JOSM too, but I’ve always found JOSM a rather unfriendly application to use.
Here’s some OpenStreetMap data of my local area, in Quantum GIS, with a road I added highlighted in red:

Open Plaques
The Open Plaques project, currently in alpha, is aiming to catalogue, photograph and georeference the numerous “blue plaques” scattered around London and elsewhere in the UK. Blue plaques generally mark the house where someone famous lived, or some other event happened. The London ones are generally blue and circular, and are put up by English Heritage or the local borough councils. Other towns and cities have their own schemes.
Contributing to the project is as easy as uploading a (georeferenced) photograph to the Open Plaques Flickr group. In due course, a “machine tag” will appear on your photo, linking it to a blue plaque in the Open Plaques database, and the photo itself should also appear on the site, as long as you’ve specified a licence that allows this to be done. If the plaque is missing from the database altogether, then a new entry presumably gets set up.
Note that the iPhones automatically georeference photos as you take them, however the GPS positional accuracy is very poor unless you give time to settle enough satellites, so I manually re-georeferenced the photo in Flickr using the interactive map tool there.
The good news is that the data on Open Plaques is public domain, so can be used for any purpose. Potentially this could include adding the plaques into OpenStreetMap in the future.
Open Plaques derived the London list from the English Heritage website, which has details and addresses, but no maps or photos. This is very similar to something I did for part of my MSc dissertation last summer, which was looking at using modern GIS and geospatial techniques for enhancing Street-O maps.
Street-O events generally involve finding places and noting down a specific answer at the place, to prove you’ve been there. Blue plaques are popular with course planners, as they are generally unique, in one clearly defined location and contain unambiguous information that the competitor is unlikely to already know.
For part of my dissertation, I screen-scraped the English Heritage website, ran the addresses through Google Local to geocode them, and then plotted the results in a GIS – the idea being the race planner could then use these to build up a race map and question sheet, without having to trawl the streets trying to find plaques manually. Around 80% of the plaques were successfully placed on the map in this way, although the geocoding accuracy wasn’t always great, due to the natural inaccuracy and non-systematic placement of street addresses.
I wrote:
(5.4.2) It was decided to look at these features as one example of using a spatial dataset unrelated to orienteering to enhance the process of creating a Street‐O map for an event…
Unfortunately the blue plaque data isn’t freely available in a spatial format – users can search by postal district, but then are presented with a list of addresses rather than a map.
The pictures below show the results for the Islington area, on the left from the dissertation, and on the right the equivalent map currently on OpenPlaques. (It would be straightforward to pull the data into a GPS, from the CSV files the site provided, for a proper side-by-side comparison. I’m just being lazy by screen-grabbing the map as-is.)

Plenty more to be added to Open Plaques. The best way, of course, is to visit them – the locations can’t be copied across from my derived list, unfortunately, as the Google-derived locations are not free of copyright.
Potentially the Open Plaques database, once complete for London, will simplify this process even more, by allowing a one-step import of plaques, inscriptions and most imporantly accurate locations, into the GIS, for easy map creation.
Mapping Party and Mapzen Sneak Peek!
The next London mapping party is on Thursday evening, in Mayfair, the really posh bit of central London (you can tell its posh as it only has one bus route going through it.) See here for details and to signup.
What’s special about this one is that Cloudmade’s in-development mapping editor, Mapzen, might be being demoed. The screenshots look very interesting, this could be pretty cool.