Video Saliency with VGG!

Came across this really cool paper that does saliency detection on videos (paper from dec 2017)! the paper and the github repo.

Continue reading “Video Saliency with VGG!”


How to tokenize japanese words in python

So, you’re working in japan, you’re an english speaking man, and you found out that tokenizing words with your nltk library isn’t cutting it in japanese.

Tokenizing in Japanese is quite a different ball of nettles compared to tokenizing in english. in english, you split stuff by whitespaces and call it a day (more or less). in Japanese however, well, you do not.

Here now, is the easiest way to unstuck yourself out of your tokenizing pothole (for a Mac user). There are other ways, of course, like natto-py.

  1. pip install JapaneseTokenizer
  2. brew install mecab
  3. brew install mecab-ipadic
  4. after that, mecab-ipadic will tell you to make sure mecab knows where its dictionary is. you make sure.
  5. try this gist. It’s a bunch of stuff strung together through a few tutorials and some hunting on stack overflow.

if python isn’t happy, try these steps:

  1. in case python doesn’t find mecab, follow this.
  2. in case of other random errors, check out natto-py’s installation procedure (you might need something called cffi for python to do C related stuff)


once you can tokenize japanese words, you can then properly do doc2vec.

here’s an alternative instruction set. here’s another.

EDIT: since posting, I’ve been trying to push a flask app with mecab dependencies onto containers, and this helped a lot. in short, tell your base image to install mecab before you install mecab-python through pip, otherwise it would not find mecab in the docker’s virtual environment.

EDIT2: and to make apt-get (linux) go and not get kicked out of docker cli whenever it asks for permission:

Do you want to continue [Y/n]?

follow this.

Finding correlations in building data

an urban catalogue(仮)

– identifying a building’s ‘character’


So now we know about how machine learning can change every industry that it touches. What can it do for architecture? Can it learn relationships between buildings? What does doing that actually do for us?

Well, let’s consider this: buildings are rows in a dataset, and their corresponding features are the columns describing them.

What the dataset can look like

These columns are descriptions of each building, ranging from:

  • location
  • height
  • number of floors, basements
  • number of apartments and offices (this small study was previously done based on mixed use developments)
  • gross floor area
  • number of parkings
  • floor area ratio

down to perhaps even more subtle features such as:

  • percentage of floor that receives direct sunlight
  • reflectivity of facade
  • contribution to heat island effect
  • building energy consumption footprint
  • weather attributes where the building is located
  • other correlated data that comes from having the building sitting in the same place, exposed to changing weather, land conditions, and socioeconomic trends.

The possibilities for hand-describing features are in fact quite endless, and similar in character to the features defined to tag songs (ala The Music Genome Project).

Right now it seems one way to make sense of a building is to hand-label these features based on traits that are familiar to architects (used by architects to describe a building).

Perhaps in time, with a sufficient size of the dataset to zoom out and get a feel for ‘what describes a building’, we can maybe move on up to trying different methods of gleaning features from raw data (photographs or the building, or raw data from sensors in the future of buildings fully connected with the cloud)

What can the Dataset Do

Initially, it can be clustered. Many clustering algorithms can be applied to a clean dataset to provide insights that come from merely referencing related buildings.

e.g. what sort of facades work well in this climate? oh here are some similar buildings from similar climates that use the same window detail and product supplier..

e.g. what are the most similar buildings in terms of (architect selects a few features out of the many)

They can also be trained in a regression model (or a simple neural network) to learn what the best features are for a new site, based on everything the dataset knows on all sites.

There are way more. They are an extremely attractive proposition. They are mouth-wateringly low hanging fruit.

We just need a large enough dataset. Wouldn’t it be nice if architecture firms started sharing data with the world?

Continue reading “Finding correlations in building data”

Some great fun with convolutions

Convolutions, I realised recently, loosely mean ‘do matrix multiplications over an entire image’, which hit me like a soft plastic hammer as I muttered a small ‘oh’ in the corner of a coffee shop nearby.

The cool thing seems to me to be the fact that the ‘image’ is actually an image as understood by my ageing laptop (and anyone else’s, really), as a bunch of numbers in a 2d array (or 3d array, if it is in colour). so convolutions are.. matrix multiplications in 2d. hmm.. I guess that means convolutions also happen in 1d and 3d and stuff…(what happens when you convolve an ASCIIgrid dataset of the world?) anyway here’s a cat:


google image search wizardry has decided that she is the world’s cutest kitten. (source)

so what sort of things do you do to the kitten? in fact, it seems a lot of filters that we use in Photoshop are convolutions. so it’s a filter? yes, and it’s also a kernel. google also says kernel is a small matrix that you plaster all over the kitty to make it different (no, not better, just.. different).

try doing this tutorial that does kitty transformation in tensorflow (compared to writing your own loops, because it reads better as a numpy array than as a nested list). Or go change the numbers in this one. below is what I got from the tf tutorial:


blur kitty


sharp kitty

random effect kitty

some randomly generated kernels. yes they’re kinda creepy. sorry kitty.



What are Convolutional Neural Networks then?

They’re kernels that learn to write themselves.

This one hit me like an iron hammer, and I was happy for quite a while after that.


What? Why do neural networks want to write their own blur filters? Oh no no its not a filter to the network, because of the way you train them. I ignominiously coin them Intelligence filters. haha. Filters in Convolutional Neural Networks learn what is the best way to ‘see’ something so that it can make a good guess at what it is.

In a Convolutional Neural Network, you stick filters layered in front of each other, each representing more and more of the portion of the image  (actually you stick neurons arranged like filters)  in front of your regression / classification network and train them with backpropagation (very loosely speaking of course).



so if you look at the above image, what is going on is:

input image > squeeze > squeeze > squeeze > squeeze > connect them all together > classify

The backpropagation goes all the way through, so that the filters that do the squeezing (extracting the ‘essense’ of a small part of the image) learns where it went wrong and where to change to make itself less wrong.

As a result, after training it learns which bits are important (almost white, because white means 1 here, which means activated), and because the neurons are laid out in 2d and it’s learning about images, we humans get to see which feature of the image is the neuron interested in.

I haven’t actually trained my own CNN yet, because I like knowing things in as much detail as I can before doing it, but when I do i’ll update this post!

follow along this tutorial, it’s really quite fun and illuminating. This too.

Owl > Tensorflow > Owl!

tf with hyperparams.gif

With the help of Mateusz (creator of Owl), there’s now a live updated window showing tensorflow training live in grasshopper! the graph in the end was matplotlib’s plot.

tf with hyperparams_cropped.gif

Closeup. Basically shows the terminal as it runs. A previously homemade python version also runs the python script from command line, but sorta just freezes the gh canvas while it trains. Wasn’t very exciting.


An excerpt from the python script running behind this (only showing the fun part). Runs on Keras with hyperparameters passed from grasshopper.

Kaggle datasets in Rhino


melbourne housing dataset.csv from kaggle.

this was using that older dataset (link to kaggle) with 9 features. coordinates were polled from Google’s geocoding API based on the address in the dataset.  I believe the updated dataset provides coordinates too, possibly using the same method described.

number of rooms

showing different ranges of number of rooms per unit


‘if i’m looking for houses that are between 5 to 7 rooms, the newest ones are the light yellow ones on the northeast edge of the city.’

other details that are inherent by linking an excel datasheet to coordinates in the real world:


height of property, contour of surrounding land


flow of water through property and general direction of water flow. (flash floods, landslides possibly?)

street view

pictures around the site (from Google Street View)

google directions

of course, different methods of transport to and from the city


weather data at a given time (historical data is paywalled, so i couldn’t access it)


soil conditions and suitability for certain forms of construction


…and some really zoomed out GIS level datasets (i’m still wondering what to do with them.)

Owl in Galapagos

TLDR: predicting rectangles with two neural networks and galapagos.


galapagos on owl 3

galapagos used to discover the best shapes for two time series neural networks

just realised that galapagos would potentially be very useful (or actually, another backprop NN might be even faster) for testing out optimal ‘window’ sizes for a time series neural network (the ‘view’ that the neural network sees when it learns to predict a number series).

When predicting with a time series neural network, one of the problems that bugged me has been that we don’t know what the best size is for prediction (called look_back in this tutorial). Too small and the NN learns that it should only go up or down, too large and it misses out on too many details.

This is where galapagos (or any other appropriate learner) comes in to help find the optimal range within which the best predictions can happen.

Galapagos was used to test 7 parameters that directly affected the neural network shape and learning rate (1 for window size, 3 for each NN : number of hidden neurons, learning rate, and steepness of the sigmoid activation function).


after 15 minutes or so, it gave me some pretty decent answers for the parameters required for learning two separate lists of parameters.

It was quite interesting to see that the learning rate varied quite a bit between the two (one was at 0.21, and another at 0.62), and alpha( used to define the steepness of the sigmoid activation function) was at 1.344 and 0.887 respectively (and then i realised that in fact learning rate is inversely proportional to alpha).

The number of hidden neurons (defining the steepness of the sigmoid activation function) stayed relatively similar at hidden = 4 neurons and 5 neurons respectively. but then, i wouldn’t have guessed if i just used a random middling number between inputs and outputs.


the resulting prediction was a prediction of a series of two parameters that define a rectangle.

Ground truth dataset in Grey, predicted dataset in Yellow.

galapagos on owl 4

the accuracy falloff after training

galapagos on owl 5 initial

before training

galapagos on owl 5 initial2

initial hand tweaking of parameters (didn’t know which ones are best to tweak)

galapagos on owl 5 learnt

so i machine learned those parameters and it got some pretty decent predictions

galapagos on owl 5 shifted

and shifted some starting rectangles and realised it predicts about up to 10 rectangles reliably enough before doing some crazy things.

plugins used: OWL, galapagos