Sunday 16 September 2018

Berlin stats and maps

I created a while ago a map with number of Spaniards in Berlin on each administrative planning area (Berlin is divided in some 450 areas which are called LOR (Lebensweltlich orientierte Räume), i.e. way more granularity than postal code). But I was not fully satisfied with that solution. So, let´s go through a second approach using bokeh. And why bokeh? Well, because my company has given me access to datacamp and, among other things, I recently did a course called Interactive Data Visualization with Bokeh. But since I learn better when doing a mini-project, rather than just going through the videos and exercises, I thought this would be fun.

First of all, I needed to find a way to plot the planning areas. This is easy enough, as Berlin has a site where the info can be downloaded as a kmz file (zipped kml files). Once unzipped, it is relatively straight forward to read the information with the coordinates for each area. However the amount of detail of the areas is just too high, with hundreds of points used for each contour. So the first hurdle was to simplify the contours of each area. As it happens, there is a python library which does exactly this. It´s appropriately called rdp as it is an implementation of the Ramer-Douglas-Peucker Algorithm (I had never heard of it until a few days). After playing a bit with it, I managed to reduce the number of coordinates used by a factor of 5 and still retain more than enough detail for the plot.

Then it was required to aggregate the statistical information from the statistical office for Berlin and Brandenburg so that the colour would be proportional to the amount of residents in the area. White if zero Spaniards live in an area and black if 300 or more live there. For info, I believe each area is supposed to have ~7500 people. And therefore the map also shows density of  (total) population in Berlin, i.e. smaller areas are very densely populated.

The visualization is nicer if you can hover the map and see name of the planning area and the population. And to bring it a notch higher, it is even nicer if you can choose the date of the report from a dropdown menu. When you make a selection, the data is updated and new colors and information are shown when hovering. The result is shown below (unfortunately not interactive in this post, as I run it in my computer with a local bokeh server)


No comments: