Visualizing Data From The Winnipeg Building Index

I have long browsed the Winnipeg Building Index (WBI), and have enjoyed the information and photos presented. I thought it would be interesting to see the information contained inside of it presented in a more visual way, e.g. in a plot, on a map. The end goal I had in mind was an animated heatmap of the geographic coordinates of the buildings in the index by decade. The idea behind the entire exercise was to practice web scraping and data visualization.

To collect the data, I used Python and a web scraping library called Beautiful Soup.

Once the data was collected, it was cleaned up. For example, there were many different year formats present in the approximately 2550 items collected:

  • 1906 (circa)
  • 1905 – 1906
  • 1950-1951
  • 1903 (1912?)
  • 1908?
  • 1885, 1904
  • 1946-
  • 1880s
  • 1971 circa

For years, the first complete (4 digits) and plausible (1830 < year < 2015) year found was used as the year for each building. Addresses were slightly less varied. For example, most suitable addresses (i.e. geocodable) took the form of “279 Garry Street” or “Main Street at Market Avenue” – simply a street (e.g. “Colony Street”) in the address column was removed. There were also a few addresses that don’t appear to actually exist. (The names of buildings aren’t important for this stage – though, they will be for a later project.) Once the data was cleaned up, it was saved again as a CSV file.

As there are a number of things that can be done with this data, I decided to do a simple task at first. Since most buildings had years associated with them, I decided to visualize the number of buildings in the WBI for each decade. To do this, I imported the ‘year’ column into plot.ly and used the histogram plot type. After that, I created 19 ‘buckets’ corresponding to each decade from the 1830s to 2010s. The result is below:

Frequency Distribution of Years by Decade - made in plot.ly

Frequency Distribution of Years by Decade – made in plot.ly

There are other interesting things that can be determined. For example, the most common streets that appear in the WBI. To find that out, I wrote a simple script that removed numbers from addresses, added them to a Counter object, and then used the most_common() method to determine the most common streets. The result is below (the legend is ordered, with Main Street being the most common):

The ten most common streets in the resulting dataset.

The ten most common streets in the resulting dataset (most common is at the top).

Following this, I imported the data into QGIS as a csv file using MMQGIS. Once loaded, the addresses were then geocoded using the Google Maps API (via MMQGIS). Geocoding is a somewhat slow process at about 160 addresses processed per minute. The result was a shapefile layer:

All geocoded points, over a Stamen Toner base map.

All geocoded points, over a Stamen Toner base map.

The points have a large amount of overlap, which means the above image does not give a good sense of the actual density of building locations. To visualize density, I used QGIS to create a heatmap from the shapefile layer. The result is below (red areas are higher density):

A heatmap of all points.

A heatmap of all points.

And as an overlay over the points themselves on a map of Winnipeg:

A heatmap of all points overlaid over the original points.

A heatmap of all points overlaid over the original points.

With the data loaded into QGIS, I was also able to answer other questions – for example, determining the highest density areas. To do that I drew polygons in the densest areas (as seen in the heatmap) and used the ‘Points in polygon’ tool to count the total number of points (geocodable addresses) that were inside. Some of the highest density areas were:

  1. Exchange District – 146 addresses*
  2. Armstrong Point – 121 addresses
  3. University of Manitoba – 65 addresses

(*using the boundaries for the National Historic Site)

Adding polygons for counting points.

Adding polygons for counting points.

The last task was to create the animated heatmap. To do that, the years associated with each point (geocoded address) were categorized by decade (i.e. 1830-1839, 1840-1849, etc) and assigned a decade code (0-18). After that, separate layers were made for each decade using the query builder (that is, a set of points associated with each decade code). After that, a heatmap was produced for each layer and exported as an image. The exported images were imported into Adobe Premiere Pro and animated. The resulting video is the following:

 

Links:

Winnipeg Building Index: http://wbi.lib.umanitoba.ca/WinnipegBuildings/

Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/

QGIS: http://qgis.org/

MMQGIS: http://michaelminn.com/linux/mmqgis/

plot.ly: http://plot.ly/

Web Scraping: http://en.wikipedia.org/wiki/Web_scraping

Stamen Toner map: http://maps.stamen.com/toner/

 

Running Famous Paintings As Piet Programs

The idea behind this was to see if one could find useful programs in a semi-random way. Randomly generating code isn’t likely to produce usable code, so I decided to try something a bit different: working backwards from something already produced and, essentially, turning things that aren’t programs into programs. In this case, potentially extracting a useful program from a painting that was produced before programming ever existed.

The Language

One of the more interesting programming languages out there is called “Piet,” named after Dutch painter Piet Mondrian by its creator, David Morgan-Mar. Piet programs are bitmaps. The execution of the program is done via a pointer that goes from one coloured region to the next, interpreting it. Piet is an example of an esoteric programming language.

There are 20 colours that are used by Piet:

palette_2a

Piet colour palette.

In particular, regarding colour blocks:

“The basic unit of Piet code is the colour block. A colour block is a contiguous block of any number of codels of one colour, bounded by blocks of other colours or by the edge of the program graphic. Blocks of colour adjacent only diagonally are not considered contiguous. A colour block may be any shape and may have “holes” of other colours inside it, which are not considered part of the block.”

To read more about the language, and how it executes, you can read the author’s description of it here: http://www.dangermouse.net/esoteric/piet.html

Choosing the Images:

The paintings were selected due to a number of different factors: having seen the paintings in person, having seen them in courses that I’ve taken (CEH.1-ENx – Explaining European Paintings, 1400 to 1800), and whether the image was in the public domain or not. Because of the latter factor, most of the paintings tend to be older. I decided to include a painting by Vermeer because Mondrian was inspired by him.

The Method of Preparing the Images:

In order for a painting to be executed as a Piet program, it would have to have the appropriate colour palette (some interpreters treat non-standard colours as black, thus preventing or ending execution). The first step is to have a program that takes an image as input and outputs the converted image with the appropriate colour palette.

To do this, I wrote a program in Python which uses a library called Pillow (a fork of PIL). The program finds the closest colour using the Euclidean distance algorithm, since the RGB values can be treated as vectors.

Large images take a long time to convert, so I restricted the largest dimension (i.e. height or width) to 300, 150, and 50 pixels. I used different sizes because of how the resizing process works (generally, the execution results differ at different resolutions).

The program is the following:

(I should note that there are a couple of ways that the above program could be optimized – e.g. memoization.)

I used a simple shell script to process the entire folder, instead of going image by image:

Execution Process:

I used the npiet interpreter (version: 1.3a-win32) written by Erik Schoenfelder (found here: http://www.bertnase.de/npiet/). The interpreter provides console output for the status of execution, and as an option, a trace image showing where the pointer went in the image.

The Results of Execution:

Each painting will have its title as the heading, and a more detailed citation at the bottom of the page. Below the image there will be the results of running the image through the npiet interpreter. There are three different image resolutions that were used (largest dimension was 300, 150, and 50 pixels) – each program is described for results from each resolution.

image sizes with px

Three sizes of images – 300, 150, and 50 pixels on the largest dimension.

Scenes from the Life of Saint Nicholas of Bari

the-story-of-st-nicholas-1448-1--combined

The original (left), and converted image.

300px:
Program initially requests char input. Upon being provided char input, the program tries to multiply it (by using its ASCII value) with the top value in the stack – this fails with a stack underflow error (since the stack has only one item at the beginning of the program). After that, it loops: it accepts new char input (1+ chars) and for each individual char it pushes that char to the stack (represented by its ASCII value), after which it adds both items in the stack (previous single value plus current input char, as an ASCII value), thus leaving a single value in the stack. When the buffer is empty, it pauses for new input from the console. One (obvious) use for this program is to add up ASCII values of strings of characters.

150px:
Program initially requests char input. Upon being provided char input, the program tries to multiply it (by using its ASCII value) with the top value in the stack – this fails with a stack underflow error (since the stack has only one item at the beginning of the program). After that it gets stuck in a loop: it tries to multiply the two top values in the stack, which it can’t, due to a stack underflow error, then it outputs the top value on the stack as a number (that is, its ASCII value, since the input was a character). It only successfully outputs once, for the initial value provided at the beginning.

50px:
The program first multiplies the top two values in the stack, which it can’t do, since there’s nothing in it. It then goes into a loop: it accepts input as char from the console (1+ chars) and from there, it first attempts to add the top two values in the stack, but can’t since there’s only a single value (stack underflow error). After that, it outputs the value in the stack as a number followed by another multiply (which fails, because the stack is empty now).

 The Birth of Venus

the-birth-of-venus-1485(1)-conv--combined

300px:
Program is a loop: it first duplicates the top value on the stack (which fails, since the stack is empty) and then attempts to divide the top two values (specification: “calculates the integer division of the second top value by the top value, and pushes the result back on the stack”), which also fails due to stack underflow.

150px:
The program is a loop. The program first accepts char input (1+ chars), then pushes it onto the stack, then attempts to add the top two values in the stack. This fails the first time, since there is only one value in the stack. After that the programs functions correctly. It seems to be identical in function to the “Scenes from the Life of Saint Nicholas of Bari” 300px program above.

50px:
The program is identical to the 150px one above.

The Man of Sorrows with Two Angels

christ-of-pity-supported-by-a-cherub-and-a-seraph-1490---combined_grey

300px:
Nothing happens.

150px:
Nothing happens.

50px:
The program is a loop. The program first accepts char input (1+ chars), then pushes it onto the stack, then attempts to add the top two values in the stack. This fails the first time, since there is only one value in the stack. After that the programs functions correctly. It seems to be identical in function to the “Scenes from the Life of Saint Nicholas of Bari” 300px program above.

The Virgin of the Chancelor Rolin

Jan_van_Eyck_070--combined

300px:
Does not run, since the first block is black.

150px:
Does not run, since the first block is black.

50px:
The program starts by adding the top two values on the stack, which fails due to stack underflow. It then accepts char input. After two chars are input, the program will then proceed to output (as a number) the sum of the ASCII values. It will then attempt to multiply the top two values of the stack, which fails due to stack underflow. After that, the program enters a loop where it continually accepts char input (1 or more) and adds it to a running total (using the ASCII values).

The Garden of Earthly Delights

1280px-The_Garden_of_Earthly_Delights_by_Bosch_High_Resolution--combined

300px:
Does not run, since the first block is black.

150px:
Does not run, since the first block is black.

50px:
The program begins by outputting the value at the top of the stack as a number, then outputs the top value in the stack as a char. It then attempts to subtract the top two values in the stack. These all fail, due to stack underflow. Following that, the program enters a loop: it attempts to duplicate the top value in the stack, then divide the top two values, then output the top value on the stack a character, then subtract the top two values in the stack, then duplicate the top value, then divide the top two values, then output the top value on the stack a character, then output the top value on the stack a number, then divide the top two values.

Woman in Blue Reading a Letter

woman-reading-a-letter-woman-in-blue-reading-a-letter--combined

300px:
The program starts by accepting character input (one char) then adds the top two vlaues in the stack, which fails due to stack underflow. It then accepts an additional character input, adds it to the previous character’s ASCII value, then attempts to add the top two values on the stack (which fails due to stack underflow). After that, it pushes the value of the colour to the stack, then it cycles between accepting character input and numeric input.

Paintings:

Scenes from the Life of Saint Nicholas of Bari, a predella of: Fra Angelico. The Virgin and Child Enthroned with Angels and Saints. 1437-1438. Triptych. Galleria Nazionale dell´Umbria, Perugia.

Botticelli, Sandro. The Birth of Venus. c. 1486. Galleria degli Uffizi, Florence.

Mantegna, Andrea. The Man of Sorrows with Two Angels. c. 1500. Statens Museum for Kunst, Copenhagen.

van Eyck, Jan. The Virgin of the Chancelor Rolin. c. 1435. Musée du Louvre, Paris.

Bosch, Hieronimus. The Garden of Earthly Delights. c. 1500. Museo Nacional del Prado, Madrid.

Vermeer, Johannes. Woman in Blue Reading a Letter. c. 1663. Rijksmuseum, Amsterdam.

Sources:

http://en.wikipedia.org/wiki/Esoteric_programming_language
http://www.dangermouse.net/esoteric/piet.html

Interpreter:

http://www.bertnase.de/npiet/

Library:

http://python-pillow.github.io/

Additional Images:

nich_300px--trace_zoom

A detail image of the program trace in “Scenes from the Life of Saint Nicholas of Bari.”

 

sorrows-50px-trace-zoom

A detail image of the program trace of the “The Man of Sorrows with Two Angels.” The trace starts at the top right of the full image, which is the top right in the detail.