Finding correlations in building data

an urban catalogue(仮)

– identifying a building’s ‘character’

Preamble

So now we know about how machine learning can change every industry that it touches. What can it do for architecture? Can it learn relationships between buildings? What does doing that actually do for us?

Well, let’s consider this: buildings are rows in a dataset, and their corresponding features are the columns describing them.

What the dataset can look like

These columns are descriptions of each building, ranging from:

  • location
  • height
  • number of floors, basements
  • number of apartments and offices (this small study was previously done based on mixed use developments)
  • gross floor area
  • number of parkings
  • floor area ratio

down to perhaps even more subtle features such as:

  • percentage of floor that receives direct sunlight
  • reflectivity of facade
  • contribution to heat island effect
  • building energy consumption footprint
  • weather attributes where the building is located
  • other correlated data that comes from having the building sitting in the same place, exposed to changing weather, land conditions, and socioeconomic trends.

The possibilities for hand-describing features are in fact quite endless, and similar in character to the features defined to tag songs (ala The Music Genome Project).

Right now it seems one way to make sense of a building is to hand-label these features based on traits that are familiar to architects (used by architects to describe a building).

Perhaps in time, with a sufficient size of the dataset to zoom out and get a feel for ‘what describes a building’, we can maybe move on up to trying different methods of gleaning features from raw data (photographs or the building, or raw data from sensors in the future of buildings fully connected with the cloud)

What can the Dataset Do

Initially, it can be clustered. Many clustering algorithms can be applied to a clean dataset to provide insights that come from merely referencing related buildings.

e.g. what sort of facades work well in this climate? oh here are some similar buildings from similar climates that use the same window detail and product supplier..

e.g. what are the most similar buildings in terms of (architect selects a few features out of the many)

They can also be trained in a regression model (or a simple neural network) to learn what the best features are for a new site, based on everything the dataset knows on all sites.

There are way more. They are an extremely attractive proposition. They are mouth-wateringly low hanging fruit.

We just need a large enough dataset. Wouldn’t it be nice if architecture firms started sharing data with the world?

Continue reading “Finding correlations in building data”

Advertisements

Designing a Prefab House System

Recently came across an old script from last year, and thought it would be nice to see if i I could still read it (disclaimer : I couldn’t. the definition was so atrocious that I had a headache just trying to understand what goes where)

This idea was a long time coming. I had been mulling about building an entire building, down to every last detail, from scratch with grasshopper, and thus allowing everything to be procedural (meaning: driven by numbers). This project was a first attempt at doing just that.

 

Step 1 : input how many boxes you want to use to make your house.

prefab6

Step 2 : place points in the boxes where you want a floor, and place arrows in the boxes where you want a staircase. draw rectangles in boxes where you want a skylight.

Step 3 :  Get building.

 

prefab1

Optional Step 1 : draw some door details to tell it how you want it to be done.

prefab2

Optional Step 2 : change the angle of your roof and skylights to your liking.prefab3

Optional Step 3 : change the column thicknesses to your liking. boring stuff like how the ends meet at corners are solved for you.

prefab4

 

 

 

Kaggle datasets in Rhino

datasets

melbourne housing dataset.csv from kaggle.

this was using that older dataset (link to kaggle) with 9 features. coordinates were polled from Google’s geocoding API based on the address in the dataset.  I believe the updated dataset provides coordinates too, possibly using the same method described.

number of rooms

showing different ranges of number of rooms per unit

kopt

‘if i’m looking for houses that are between 5 to 7 rooms, the newest ones are the light yellow ones on the northeast edge of the city.’

other details that are inherent by linking an excel datasheet to coordinates in the real world:

contour

height of property, contour of surrounding land

watershed

flow of water through property and general direction of water flow. (flash floods, landslides possibly?)

street view

pictures around the site (from Google Street View)

google directions

of course, different methods of transport to and from the city

clouds

weather data at a given time (historical data is paywalled, so i couldn’t access it)

soil_cropped.gif

soil conditions and suitability for certain forms of construction

aussie

…and some really zoomed out GIS level datasets (i’m still wondering what to do with them.)

Owl in Galapagos

TLDR: predicting rectangles with two neural networks and galapagos.

 

galapagos on owl 3

galapagos used to discover the best shapes for two time series neural networks

just realised that galapagos would potentially be very useful (or actually, another backprop NN might be even faster) for testing out optimal ‘window’ sizes for a time series neural network (the ‘view’ that the neural network sees when it learns to predict a number series).

When predicting with a time series neural network, one of the problems that bugged me has been that we don’t know what the best size is for prediction (called look_back in this tutorial). Too small and the NN learns that it should only go up or down, too large and it misses out on too many details.

This is where galapagos (or any other appropriate learner) comes in to help find the optimal range within which the best predictions can happen.

Galapagos was used to test 7 parameters that directly affected the neural network shape and learning rate (1 for window size, 3 for each NN : number of hidden neurons, learning rate, and steepness of the sigmoid activation function).

param1

after 15 minutes or so, it gave me some pretty decent answers for the parameters required for learning two separate lists of parameters.

It was quite interesting to see that the learning rate varied quite a bit between the two (one was at 0.21, and another at 0.62), and alpha( used to define the steepness of the sigmoid activation function) was at 1.344 and 0.887 respectively (and then i realised that in fact learning rate is inversely proportional to alpha).

The number of hidden neurons (defining the steepness of the sigmoid activation function) stayed relatively similar at hidden = 4 neurons and 5 neurons respectively. but then, i wouldn’t have guessed if i just used a random middling number between inputs and outputs.

param2

the resulting prediction was a prediction of a series of two parameters that define a rectangle.

Ground truth dataset in Grey, predicted dataset in Yellow.

galapagos on owl 4

the accuracy falloff after training

galapagos on owl 5 initial

before training

galapagos on owl 5 initial2

initial hand tweaking of parameters (didn’t know which ones are best to tweak)

galapagos on owl 5 learnt

so i machine learned those parameters and it got some pretty decent predictions

galapagos on owl 5 shifted

and shifted some starting rectangles and realised it predicts about up to 10 rectangles reliably enough before doing some crazy things.

plugins used: OWL, galapagos

Panel Rationalization (OWL)

never has panel rationalization been so straighforward! k-means clustering to sort similar panels!

panels are sorted by two parameters:

  • panel area
  • surface normals

and then replaced with set panel dimensions (an average of each cluster) + 20mm offset. the results are pretty decent, with minimal overlap even at the steep bits of the surface.

panel types

number of clusters (types of rectangles) from 2 – 50, iterations = 3

clustering iterations

k (number of clusters) = 25, iterations running from 1 – 30

EDIT : a little extra definition showing colour clustering to reduce the number of colour variations needed from 1124 to 10-50.

18217830_10154861225159064_1385605154_n

colour variations from 10-30, 30 iterations

18197623_10154861722884064_1987930800_n

10 colours, iterations from 10-30, showing how the clustering works in realtime

plugins used: OWL