Identifying the colour of bacterial colonies using DBSCAN in OpenCFU

In my PhD work, I’ve had to count a lot of dots on Petri dishes. It is a pretty mundane task but still very important. The majority of dishes I see look something like this:


Each dot is a colony of E. coli bacteria, and there are two strains grown there in competition with each other. At the end of it all, I and colleagues have had to count how many dark red dots there are compared to light red dots. Fine for the first few plates, but after a hundred plates each containing a hundred dots, it gets repetitive. Moreover, any lapses in concentration may result in errors slipping through the experiment, adding possibly undetectable experimental biases.

It would be nice if there were an automatic way to do this. For counting colonies, I’m a fan of the OpenCFU colony counter (Github, paper). It’s open source, the main developer is open to new contributions, the code is easy to extend, and maybe most importantly, it works really well. Running the above image through OpenCFU identifies 108 colonies, highlighting them as below.


Of course, what we want is to be able to say that there are X dark red colonies and Y light red colonies. That involves clustering. And for that I used DBSCAN (Wikipedia reference). DBSCAN is a pretty robust algorithm that can identify an arbitrary number of clusters. One needs to specify only two parameters to the algorithm, a threshold distance between points, and the number of points a point should have nearby it to be considered as part of the cluster*. Typically, the number of nearby points should be at least the number of dimensions in the search space plus one. Finding a distance however is a little tricky, after all, what is the distance between two colours?

The distance between two colours

Before we get in to detail here, it’s worth first considering how colours are treated by your computer and camera. Simplifying a little, a pixel on the CCD in a camera sees a colour, say orange, and turns that into a measurement of three colours, red, green and blue. It gives a measurement of the colour intensity in each channel between 0 and 255. So a really vibrant orange might be red=255, green=153 and blue=0, or (255, 153, 0) for short.

In a perfect world, when you look at the photo on a screen, the screen will set its red, green and blue pixels to (255, 153, 0) and the exact same colour will be reproduced. However, different manufacturers use slightly different components, which can lead to tiny differences between what two devices call the same colour. So to avoid this the camera converts every colour to a standardised colour space called sRGB which applies a slight calibration to each colour to account for the idiosyncrasies of the camera (and similarly, any display can account for its own characteristics by performing a conversion from sRGB when it reads a file).

Naively then, we should be able to use these sRGB co-ordinates from each pixel to generate a distance between two colours. Each colour can be thought of as a point in a 3-dimensional space, so the straight line distance between the points could serve as a measurement of colour distance. To see how well this works, look at the image below. Three sets of two colours are shown next to each other, with their distance according to this metric shown.


Depending on your eyesight, you may see differences more clearly than me, but for me, the distance between colours I have shown doesn’t correspond at all well to how discernible they are. And while I can discern each of the first two sets of colours easily, the third lot is almost impossible. Similar to how the cameras and monitors perform a conversion between their colour measurements and sRGB to account for their characteristics, we need to perform some kind of conversion that accounts for how the human eye perceives differences between colours. Fortunately, the hard work is already done for us through the L*a*b* colour space. This colour space was designed, amongst other things, so that differences between colours are more uniform. Returning to the previous example in the L*a*b* domain, we find the distances between colours much better match our perception of colour difference.


Distances here are calculated using the straight line distance between the three L*a*b* co-ordinates, but there exist many more colour distance metrics for higher levels of precision.

Onwards to clustering

Using this colour space, we now have a measurement of distance that works in a domain similar to what the human eye sees. Given we design experiments typically based on our perception of colour, this will generally allow colours that a human would separate to be distinguished by a computer. Within OpenCFU, I added a new menu item to allow the threshold colour distance between two colony types to be varied, and added a post-processing stage that clusters the colonies. An example is below:


Here 74 dark red colonies are found and 32 light red colonies. A few colonies found when no clustering was applied were lost, as there colours didn’t correspond to the main groups. This can be good occasionally, as it acts to remove incorrectly identified colonies. In both cases, some colonies are missed, though a manual verification step can correct these mistakes if a higher level of accuracy is needed. Or make a lightbox and for more consistently illuminated photos.

*Technically that isn’t entirely true, a point needs to have a minimum number of neighbours only if it is considered to be part of the ‘core’ of the cluster. A cluster also includes edge points which are joined to a core point, but have less than the minimum number of neighbours.


3 thoughts on “Identifying the colour of bacterial colonies using DBSCAN in OpenCFU

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s