Finding correlations in building data

an urban catalogue(仮)

– identifying a building’s ‘character’


So now we know about how machine learning can change every industry that it touches. What can it do for architecture? Can it learn relationships between buildings? What does doing that actually do for us?

Well, let’s consider this: buildings are rows in a dataset, and their corresponding features are the columns describing them.

What the dataset can look like

These columns are descriptions of each building, ranging from:

  • location
  • height
  • number of floors, basements
  • number of apartments and offices (this small study was previously done based on mixed use developments)
  • gross floor area
  • number of parkings
  • floor area ratio

down to perhaps even more subtle features such as:

  • percentage of floor that receives direct sunlight
  • reflectivity of facade
  • contribution to heat island effect
  • building energy consumption footprint
  • weather attributes where the building is located
  • other correlated data that comes from having the building sitting in the same place, exposed to changing weather, land conditions, and socioeconomic trends.

The possibilities for hand-describing features are in fact quite endless, and similar in character to the features defined to tag songs (ala The Music Genome Project).

Right now it seems one way to make sense of a building is to hand-label these features based on traits that are familiar to architects (used by architects to describe a building).

Perhaps in time, with a sufficient size of the dataset to zoom out and get a feel for ‘what describes a building’, we can maybe move on up to trying different methods of gleaning features from raw data (photographs or the building, or raw data from sensors in the future of buildings fully connected with the cloud)

What can the Dataset Do

Initially, it can be clustered. Many clustering algorithms can be applied to a clean dataset to provide insights that come from merely referencing related buildings.

e.g. what sort of facades work well in this climate? oh here are some similar buildings from similar climates that use the same window detail and product supplier..

e.g. what are the most similar buildings in terms of (architect selects a few features out of the many)

They can also be trained in a regression model (or a simple neural network) to learn what the best features are for a new site, based on everything the dataset knows on all sites.

There are way more. They are an extremely attractive proposition. They are mouth-wateringly low hanging fruit.

We just need a large enough dataset. Wouldn’t it be nice if architecture firms started sharing data with the world?

Continue reading “Finding correlations in building data”


Some great fun with convolutions

Convolutions, I realised recently, loosely mean ‘do matrix multiplications over an entire image’, which hit me like a soft plastic hammer as I muttered a small ‘oh’ in the corner of a coffee shop nearby.

The cool thing seems to me to be the fact that the ‘image’ is actually an image as understood by my ageing laptop (and anyone else’s, really), as a bunch of numbers in a 2d array (or 3d array, if it is in colour). so convolutions are.. matrix multiplications in 2d. hmm.. I guess that means convolutions also happen in 1d and 3d and stuff…(what happens when you convolve an ASCIIgrid dataset of the world?) anyway here’s a cat:


google image search wizardry has decided that she is the world’s cutest kitten. (source)

so what sort of things do you do to the kitten? in fact, it seems a lot of filters that we use in Photoshop are convolutions. so it’s a filter? yes, and it’s also a kernel. google also says kernel is a small matrix that you plaster all over the kitty to make it different (no, not better, just.. different).

try doing this tutorial that does kitty transformation in tensorflow (compared to writing your own loops, because it reads better as a numpy array than as a nested list). Or go change the numbers in this one. below is what I got from the tf tutorial:


blur kitty


sharp kitty

random effect kitty

some randomly generated kernels. yes they’re kinda creepy. sorry kitty.



What are Convolutional Neural Networks then?

They’re kernels that learn to write themselves.

This one hit me like an iron hammer, and I was happy for quite a while after that.


What? Why do neural networks want to write their own blur filters? Oh no no its not a filter to the network, because of the way you train them. I ignominiously coin them Intelligence filters. haha. Filters in Convolutional Neural Networks learn what is the best way to ‘see’ something so that it can make a good guess at what it is.

In a Convolutional Neural Network, you stick filters layered in front of each other, each representing more and more of the portion of the image  (actually you stick neurons arranged like filters)  in front of your regression / classification network and train them with backpropagation (very loosely speaking of course).



so if you look at the above image, what is going on is:

input image > squeeze > squeeze > squeeze > squeeze > connect them all together > classify

The backpropagation goes all the way through, so that the filters that do the squeezing (extracting the ‘essense’ of a small part of the image) learns where it went wrong and where to change to make itself less wrong.

As a result, after training it learns which bits are important (almost white, because white means 1 here, which means activated), and because the neurons are laid out in 2d and it’s learning about images, we humans get to see which feature of the image is the neuron interested in.

I haven’t actually trained my own CNN yet, because I like knowing things in as much detail as I can before doing it, but when I do i’ll update this post!

follow along this tutorial, it’s really quite fun and illuminating. This too.

Kaggle datasets in Rhino


melbourne housing dataset.csv from kaggle.

this was using that older dataset (link to kaggle) with 9 features. coordinates were polled from Google’s geocoding API based on the address in the dataset.  I believe the updated dataset provides coordinates too, possibly using the same method described.

number of rooms

showing different ranges of number of rooms per unit


‘if i’m looking for houses that are between 5 to 7 rooms, the newest ones are the light yellow ones on the northeast edge of the city.’

other details that are inherent by linking an excel datasheet to coordinates in the real world:


height of property, contour of surrounding land


flow of water through property and general direction of water flow. (flash floods, landslides possibly?)

street view

pictures around the site (from Google Street View)

google directions

of course, different methods of transport to and from the city


weather data at a given time (historical data is paywalled, so i couldn’t access it)


soil conditions and suitability for certain forms of construction


…and some really zoomed out GIS level datasets (i’m still wondering what to do with them.)

Owl in Galapagos

TLDR: predicting rectangles with two neural networks and galapagos.


galapagos on owl 3

galapagos used to discover the best shapes for two time series neural networks

just realised that galapagos would potentially be very useful (or actually, another backprop NN might be even faster) for testing out optimal ‘window’ sizes for a time series neural network (the ‘view’ that the neural network sees when it learns to predict a number series).

When predicting with a time series neural network, one of the problems that bugged me has been that we don’t know what the best size is for prediction (called look_back in this tutorial). Too small and the NN learns that it should only go up or down, too large and it misses out on too many details.

This is where galapagos (or any other appropriate learner) comes in to help find the optimal range within which the best predictions can happen.

Galapagos was used to test 7 parameters that directly affected the neural network shape and learning rate (1 for window size, 3 for each NN : number of hidden neurons, learning rate, and steepness of the sigmoid activation function).


after 15 minutes or so, it gave me some pretty decent answers for the parameters required for learning two separate lists of parameters.

It was quite interesting to see that the learning rate varied quite a bit between the two (one was at 0.21, and another at 0.62), and alpha( used to define the steepness of the sigmoid activation function) was at 1.344 and 0.887 respectively (and then i realised that in fact learning rate is inversely proportional to alpha).

The number of hidden neurons (defining the steepness of the sigmoid activation function) stayed relatively similar at hidden = 4 neurons and 5 neurons respectively. but then, i wouldn’t have guessed if i just used a random middling number between inputs and outputs.


the resulting prediction was a prediction of a series of two parameters that define a rectangle.

Ground truth dataset in Grey, predicted dataset in Yellow.

galapagos on owl 4

the accuracy falloff after training

galapagos on owl 5 initial

before training

galapagos on owl 5 initial2

initial hand tweaking of parameters (didn’t know which ones are best to tweak)

galapagos on owl 5 learnt

so i machine learned those parameters and it got some pretty decent predictions

galapagos on owl 5 shifted

and shifted some starting rectangles and realised it predicts about up to 10 rectangles reliably enough before doing some crazy things.

plugins used: OWL, galapagos

Panel Rationalization (OWL)

never has panel rationalization been so straighforward! k-means clustering to sort similar panels!

panels are sorted by two parameters:

  • panel area
  • surface normals

and then replaced with set panel dimensions (an average of each cluster) + 20mm offset. the results are pretty decent, with minimal overlap even at the steep bits of the surface.

panel types

number of clusters (types of rectangles) from 2 – 50, iterations = 3

clustering iterations

k (number of clusters) = 25, iterations running from 1 – 30

EDIT : a little extra definition showing colour clustering to reduce the number of colour variations needed from 1124 to 10-50.


colour variations from 10-30, 30 iterations


10 colours, iterations from 10-30, showing how the clustering works in realtime

plugins used: OWL

Machine Learning with OWL

Just attended a workshop last week on machine learning in grasshopper, and here are some results!

clustering final

slightly edited version of the final presentation board

The above is a combination of a few techniques in machine learning, used to find clusters of correlated sites in Dubai, based on a given combination of parameters.

here’s the breakdown:

The intention was to find out different hotspots in Dubai, based on a few parameters that affect the popularity of the given site, and then to create a set of street furniture that would be contextually sensitive to the site.

graph dark

here’s a pretty relationship graph to whet your appetite

1. Mapping the popularity of places in Dubai


graph of all the parameters used and their numerical values in a circle

Mapping the popularity of areas in Dubai utilizes k-means clustering to find groups of places based on a few factors :

  • ratings (taken from the Google Places API)
  • number of reviews and their scores
  • the security presence on site (to gauge how private a given building is)
  • building capacity (size)
  • type of building (commercial, residential, utility, etc)
  • distance of that building from other buildings (to gauge the density of the area)


k-means clustering and their averaged characteristics of the clusters (e.g. in the light blue cluster, the metro stations, power plant, burj al arab, and the police station generally have a strong security presence, get a rating of about 4.3 stars, and for some reason are considered small sized buildings in relation to other sites)


2. Design of chairs that would directly correlate to the clusters on the map

a parametric model of a street bench is created (the old fashioned way, in grasshopper) with a set of 12 parameters defining width, divisions, backrest height etc, and then run through an autoencoder to reduce its parametric dimensionality to 2.

this means that with two sliders, one is able to create a set of street furniture that captures the relationships of all 12 parameters that are used to create the said street furniture.

this also means that by creating a 2d grid of points, one can see the entire series of permutations of the design in question, be it a chair, a house, or a city.

3 versions of chairs were defined by the designer (well, someone from our team) and fed into the autoencoder to find out what are the strongest correlations between all three designs (the designs themselves are at the top left, bottom left, and bottom right corners of the graphic below).

these correlations are then fed back into the trained autoencoder network to ‘decode’ the relationships between the 3 objects. hence all permutations between these three given designs define some parts of the characteristics of each designed object.

say, the top left design is a simple long bench, bottom left a short bench, and bottom right a large circular bench with backseats and divisions in between them. The network then finds a set of ‘morphed’ objects that each have a bit of ‘division-ness’, ‘long-ness’, ‘backrest-ness’ in between them.

then the entire set is run through another k-means clustering algorithm to find out which version of seating is most suitable for which areas in Dubai, based on a different set of related parameters, this time being amount of seating area, number of divisions, and bench length: e.g. Dubai Mall and Emirates Mall have the highest traffic (gauged by the number of reviews and size of the building), so they would require seatings with the largest amount of area.


the graphic above shows the entire array of all possible permutations of the street furniture that fit within our given definition.

I hope you would see the potential of using machine learning in architecture, as more than ever, it allows real, big data to be directly linked to design in a way that is not simply a designer’s intuition. This technique can be applied to include all the parameters that a modern building should have, like sustainibility targets, cost, environmental measures, passivhaus implementations, fire regulations.. the list is endless. We as architects should start incorporating them into design automatically so that we give ourselves more time to do the things we enjoy, like conceptual input, or our own flair in design.

the above statement doesn’t apply to people who really really enjoy wrangling with fire regulations and sustainability issues manually, or believe in the nostalgia of the architect-as-master-builder who is able to handle all the different incoming sources of requirements and create something that satisfies them ALL. disclaimer: i’m not one of them.

P.S. oh yes, and since we had a bit of free time during the workshop, and we didn’t know what to call our project, we decided to machine learn our team name by using a markov chain network.


the project name. it’s a set of slightly random babbling that the definition below spit out after reading an article about machine learning in wikipedia. kinda sounds like english though, so it’s all fine.

project name process

plugins used: OWL, ghpython, anemone

Backpropagation, Machine Learning and all that jazz (anecdotal)

backpropagation in grasshopper

backpropagation in grasshopper! (in ironpython actually)

(sorry about the small text in the gif, i’ll make a nicer one in the near future)

recently went on a quest for the full understanding of backpropagation (used in training neural networks (Machine Learning/AI)), and came upon this amazing blog post detailing the implementation of backpropagation in python.

I spent one day following through the code line by line, but still wasn’t able to grasp the structure of how it worked, so I did what I usually do when I couldn’t understand difficult theories: break it up into bits and implement them in grasshopper!

The result of the deconstructed backpropagation algorithm looks pretty straightforward in hindsight, but then hindsight is always 20/20.

Actually running the loop through 500 epochs (as mentioned in the blog post) took hazardously long, so the definition doesn’t do ‘real’ learning, but even after 25 epochs one could see the accuracy actually climbing.

By comparison, here’s a gif of the same implementation in pure(r) python (it’s ironpython, inside grasshopper, inside rhino3D). note that both implementations took out cross validation, and instead did a simple 66% train set and 33% test set split:

backpropagation in ironpython

computing the same dataset for 25 epochs took 0.8 seconds. T_T

However, the grasshopper definition was, among other things, slow enough for the simple human mind to make some really important observations.

Here a few interesting observations so far:

First, Neural networks are in fact… a dictionary of weights! that’s it! (oh wait, see  update) (it was so mindblowingly simple to me I had to walk around for a few minutes wondering if i missed something). In actual fact, it is a network of weights that incrementally update over many iterations to resemble an ‘abstraction’ of what the dataset is. they are then used to extrapolate data (among other things). The dictionary thing was due to the code being written in Python.

(update 20170820: actually, it is a dictionary of weights and an ‘activation function’, as I found out just a few days after this post but didn’t get around to updating it. an activation function is a function that puts the weights on a sloped graph (like a sigmoid function/tanh, or even y=mx+b which makes a sloped line) so that it knows whether going up or down is a good thing)

Think of training a neural network as ‘finding a polynomial(ish) function that fits a given set of numbers. Once you find a polynomial function, you can extrapolate from the function to guess new data, like below:

polynomial maker

I found this to be extremely valuable in helping me intuitively grasp what a neural network ‘looks like’ (the gif above is the implementation of it in python).

In the same way, a neural network makes a best guess at this ‘function’ again and again, and it is tested against a given result (labels/classes/actual output that you use to compare), and at every iteration an error is calculated (error = how far away is the current mapping/function from the right answer) and backpropagated (see next paragraph). Note that nothing is absolute. the neural network’s ‘guess’ is a set of probabilities: 10% says its A, 63% says its B, and 27% says its C, so the best guess is B.

Secondly, Backpropagation means ‘pass that error from the result back layer by layer (opposite direction) and find out which neuron is responsible for how much of that error’, and is essentially the ‘blame game‘ played by the neurons in the network.

An anecdotal way of thinking about training goes like this:

‘the output neuron at the end of the line gets a result from his team of hard working neurons and sees that its off by -0.34, so the weights (which as numbers) need to arrange themselves so that they move upward by 0.34 to get it right. he shouts back down the line: ‘oi! which of you got it wrong?’ and the neurons huddle together and start assigning blame to each other, the first one saying, ‘i’m only reponsible for 0.1 parts of this result, check the neuron before me, he gave me 0.9 parts of the paperwork already’ and the second goes ‘i only did 0.3 parts of this, check the guy before me’ and so on and so forth, and then the output neuron sees how many parts of the blame fall on which neuron and finds that among all of them, actually the second guy (0.3) gave the most error. And he gave him a good whipping. XD The next time round, the second neuron was more careful in doing his paperwork.

And that, I believe, is my current understanding of how backpropagation works.

Third, This might be obvious to some people, but I also found that it helps if I thought of the layers of the neural network as the ‘lines’ in the network graph rather than the ‘dots’, at least in the process of coding it up. Because we type up each network layer as a function of mapping something to something else (input > hidden, hidden > output), and not a container that stores an ‘abstraction’ or mapping. I realised this after reading something like this:


and wondering ‘what do i need to define as my input layer!? you said there’s an input layer, where is it? its a magical layer that doesn’t exist? or its supposed to do neuron = input? and what is the hidden layer doing? its supposed to make an abstraction of the raw input? I don’t know, it just looks like its sitting there accepting neurons from the input layer…‘ (that monologue didn’t go anywhere)

Hence I posit that neural network maps be labeled this way in the future:


Then it would be quite clear that the hidden layer works by mixing all raw inputs together, an outputs an abstraction, and then the output layer uses the 3 (or more) abstractions of the data to make a slightly educated guess. Backpropagate and repeat.

Fourth, some observations.

Learning rate. I realised that the neural network’s accuracy basically just ‘sat there’ for the first 20 or so epochs because the weights in the neurons are changing at a rate that was so slow that it didn’t manage to tip the balance of probabilities (e.g. iteration 1 neuron: ‘oh, i got that wrong. lemme change it a bit’ -> iteration 2: ‘oh, its still wrong, lemme change it a bit again….’up to 20 epochs). Perhaps just plodding along at the same pace wasn’t the most efficient way of doing things. This warrants further investigation.

More Hidden Layers seem a bit counterintuitive at first. Until one reads Geoff Hinton’s slide in this post. Then one realizes he might just need to feed More Data!

Initial weights matter quite a bit. Starting off from the wrong foot means there’s a lot more distance to get to a good accuracy, and more distance means more computation time. Is this where Naive Bayes comes in? by having better initial weights, the network converges faster? With the caveat of exposing the network to bias? Will try this very soon.

So, my current takeaway from the past two months of reading and practising machine learning:

  • Neural Networks are : a set of numbers (called weights) and get updated for n times, with an error check that tells the function how best to change the numbers to get less errors the next time round.
  • I guess an article called ‘weight updater with error checking’ sounds like something headed for obscurity, so Neural Networks via Backpropagation is used instead. I mean, I wouldn’t want to name a building I designed a ‘terraced house with front porch’ for the same reasons.
  • Of course, network algorithms are the secret sauce that I have yet to learn, so I expect this article to be updated as I go along!

All above stuff is purely anecdotal. I hope more knowledgeable readers might point out in the comments below the bits that I intuited wrongly, or bits of facts that have been peddled falsely or might lead to horrible misunderstandings down the road. I will update my beliefs *wink wink Bayesian Inference wink wink* based on these comments so as to arrive at a more accurate frame of understanding.