Deep dive into kNN’s distance metrics
Hello fellow machine learners,
In last week’s article, we discussed how the kNN algorithm works, the underpinnings of which lent themselves quite nicely to visual demonstration. In that piece, I briefly mentioned the Euclidean distance, which is the ‘default’ distance metric that we all use without realising. But what even is a metric? And what relevance does it have to kNN?
A primer on metrics
The k-th Nearest Neighbour algorithm (kNN for short) takes a point, figures out which k points are ‘closest’ to it, and makes a classification based on the most common label of those k neighbours. In order to determine how ‘close’ two points are, we require some notion of distance. The distance between two points depends on the metric we’re working with.
Put plainly, a metric is a non-negative, function d such that, for any data points x,y and z, the so-called triangle inequality is satisfied:
Here is one way to interpret the inequality:
Imagine walking along two sides of a triangle instead of cutting straight across the third side. The “shortcut” (straight across) is always shorter or equal to the longer walk (adding the two sides together).
We have already discussed the Euclidean distance, which is the standard distance metric we use in real life without explicitly mentioning it.
As a reminder: for the pair of n-dimensional data points
the Euclidean distance between them is defined as
I will leave you to think about why this function is both non-negative and symmetric. Proving that this metric satisfies the triangle inequality is somewhat involved, but if you’re interested, you should check out the first few pages of these lecture notes.
A similar distance metric is what’s commonly known as the Manhattan distance. For points in 2D, we simply add up the horizontal and vertical distances. This is almost the same as the Euclidean distance, except there is no squaring or square-rooting.
This metric is called the Manhattan distance, because the visualisation of the metric is akin to the taxi cab system in the US. Vehicles can only travel across roads, which adhere to a grid-like structure, allowing for either horizontal or vertical traversal.
The animation below depicts the Euclidean distance (orange) and a few ways we can think of the Manhattan distance (yellow) between a pair of points. Starting from the point in the bottom left, we can only move up or right- each yellow path illustrates the same distance when the horizontal and vertical components are added together.
What’s the connection?
For the pair of n-dimensional data points x and y, the formula for the Minkowski distance metric is given by
If we set p=1 in the Minkowski distance formula, we arrive at the Manhattan distance.
Here’s a cool experiment to help us get our heads around these different metrics in a visual way: consider of all the points in coordinate space that are exactly one unit away from the origin. In n-dimensional space, this set of points can be written as:
In n=2 dimensions, we instead consider data points in 2D and an origin of (0,0), in which case this set can be written as
Before reading further, have a think about what points might belong to this set. It may help to try some sketches on paper. Which points on a 2D grid are exactly one unit away from (0,0)?
Did you have a go? Promise? All right then…
In 2D, this set is the unit circle, the equation of which is
In full generality, mathematicians refer to this set as the unit sphere. The reason we get a circle in 2D like this is because we’re implicitly using the Euclidean distance metric. So if we wanted to be super pedantic about our notation, we would write
In the above, we can change the subscript on the left hand side to represent the intended metric.
Boundary of the unit sphere for different values of p
What happens if we try using different distance metrics? Will the unit sphere still be, er, spherical?
In the 2D case, we are looking for points that satisfy
Since we only care about the distance of these points from the origin, we can set all the y-coordinates equal to 0, which leaves us with
In the below animation, we plot the unit sphere for the Minkowski distance metric for different values of p. It’s probably worth pausing the video for each value of p and pondering why the corresponding unit sphere takes the form that it does:
The unit sphere of Manhattan distance metric (i.e. p=1) is a small diamond.
For the Euclidean distance metric (p=2), we get the circle we spoke about before.
For p=0.5, the shape is a pretty bizarre, I must admit. To verify this shape, convince yourself that the following equivalence holds (the positive solution gives the top half of the unit sphere, and the negative solution gives the bottom half):
As the value of p increases to infinity, the unit sphere converges upon a square of side lengths 2.
Important note: the Minkowski formula is not a distance metric when p < 1, because the triangle inequality will no longer be satisfied.
Implementation with kNN
As we delve into more complicated ML algorithms, we will come across models that have their own set of parameters. We call these the hyperparameters of the model.
The scikit-learn implementation of kNN has its own set of hyperparameters. And wouldn’t you know it, one of them is called metric
, the default of which is Minkowski for p=2, i.e. the Euclidean distance metric!
Packing it all up
Today’s article was a bit more on the abstract side, but don’t forget that machine learning is underpinned entirely by maths. Keep your numeric skills sharp and you’ll save yourself a lot of headache down the line.
Here are the key things to remember:
- A distance metric is a non-negative, symmetric function that satisfies the triangle inequality.
- The Minkowski distance metric allows us to generalise the notion of distance by adjusting the value of the parameter p.
- The unit sphere takes on a different form depending on the distance metric you’re working with.
- The kNN algorithm provided in the scikit-learn Python package is equipped to deal with a whole host of distance metrics. When building a kNN classifier, have a play around with the options and see what boasts the best performance!
I really hope you enjoyed reading!
Do leave a comment if you’re unsure about anything, if you think I’ve made a mistake somewhere, or if you have a suggestion for what we should learn about next 😎
Until next Sunday,
Ameer
Originally published at https://ameersaleem.substack.com.