  # Inside the Hungarian Algorithm: A Self-Driving Car Example

artificial intelligence Nov 10, 2021

The Hungarian Algorithm (or Kuhn Munkres algorithm) is one of the algorithms I've used the most in self-driving cars. In my early days as a self-driving car engineer, I was tasked to match bounding boxes from frame to frame, to do something called Multi-Object Tracking. Later, I was given bounding boxes in 3D from a LiDAR, and detections in 2D from a camera; and I was tasked to fuse these two. Once again, the Hungarian Algorithm was the way to do it.

To describe better how this algorithm can come handy, let me give you a very simple image.

Imagine we are tracking a yellow taxi in an image... We could use an object detection algorithm, and it would return a bounding box on every frame.

But now, what if in the second image, a second object was also detected? How can we assign IDs to the right obstacles?

In a more concrete example, this is the scene we could have:

Now, imagine we’re tracking 20 objects across several frames, all occluding with each other, sometimes disappearing in frames, sometimes appearing as ghosts for no reasons, how could we solve this nightmare?

In the Sensor Fusion example I introduced earlier, the Boxes from the camera should be matched with the boxes from the LiDAR.

But in this specific case, the camera produces 2D boxes, and can sometimes miss. While the LiDAR produces 3D boxes, and can have more boxes than the reality. Understanding and mastering how the hungarian algorithm works is the solution that can help you solve these edge case you might face.

If you're reading this article, it's likely that you need to understand how the Hungarian Algorithm works. In this article, we’ll take an in-depth look at the algorithm applied to self-driving cars. We'll see how it works using a very basic example, and we'll then cover some edge cases you might have such as unequal graphs, or maximization problems.

This algorithm is at the heart of object tracking, but please note that I have a separate article just on object tracking, and that this one is specifically focused on how the algorithm works.

Before we start, I invite you to receive my daily emails to get cutting-edge content on self-driving cars, computer vision, and robotics.

Let's begin with a definition...

## What is the Hungarian Algorithm (Kuhn-Munkres Algorithm)?

### Bipartite Matching and Graphs

The Hungarian Algorithm is also named differently: Bipartite Graph Matching. The idea of Bipartite Graph Matching is to build a graph with distances, and to assign nodes from one side of the graph to the other.

In the following example, we’ve detected objects, and we can put these in the following graph.B ased on the distance, we can decide to assign the objects that match with each other.

Now that you have a good idea of what the algorithm does, let’s see how it works to solve the optimal assignment problem.

### Job vs Workers Example — Solving the Optimal Assignment Problem

To explain how the Hungarian Algorithm works, I'll take an assignment from another example: Assigning Jobs to Workers based on the distance to the office. In this example, there is no remote work and everybody still needs to get to work! So how can we send workers to the jobs based on the shortest distance?

So here's a Matrix that translates our example.

Pop Quiz — Intuitively, which job would you assign to which worker? It seems like all workers should go to job 3! So who should be satisfied?

Here’s how the algorithm solves the problem:

1. Subtract the minimum from each row
2. Subtract the minimum from each column
3. Cross the 0s with the minimum number of lines. If you need less lines than n, the number of rows, go to step 4. Otherwise, go directly to step 5.
4. Subtract the smallest element of the entire matrix. If an element has been crossed twice, add it to the place where it’s double crossed. Then, go back to Step 3.
5. Assign Jobs to Workers starting with the line with only one zero! Every time we assign a job with a worker, cross its row and column to make it unavailable.

Okay, it looks completely crazy. In the following video, I apply the 5 steps to solve the problem.

Here’s a short recap of what we’ve done: Let’s apply the specific steps to our Job vs Worker example.

Step 1 — Subtract the minimum from each row

In our original matrix on the left, we can see that 9, 5, and 3 are the numbers to remove. We’re going to subtract 9 from the first row, 5 from the second row, and 3 from the third row.

Step 2 — Subtract the minimum from each column

Second, we’ll do a similar job on the columns.

In our new matrix, we’ll subtract 1 from column 1, 6 from column 2, and 0 from column 3.

Step 3 — Cross 0s with the minimum number of lines, then check the number of lines needed.

This third step can be a bit tricky to understand. We want to cross 0s with the minimum number of lines we can do. Every time you see a 0, draw a line on either the row or the column.

In the following example, the left crossing is wrong, because it requires 5 lines to cover all 0s while we could do it with only 2 lines, as on the right example.

If you’re a developer, you probably start to think about how you would code this? Any ideas in mind?

The third step has a second part “then check the number of lines needed”. If that number is < n, the number of rows and columns in the matrix, reduce the matrix again.

In our example, we have 2 lines, but we have a 3x3 matrix, so we’ll reduce again. If we didn’t have this case, we’d directly jump to step 5.

Step 4 —Subtract the Smallest Element of the Entire Matrix. If an element has been crossed twice, add it to the place where it’s double crossed.

Okay, the inventor of the algorithm had to be drunk to get that step, or any other step.

Let’s take a look at the matrix now and see how we subtract 2 to the entire matrix, and then add it to the top right element, crossed twice..

You’re probably wondering what number 5 is? Well, we’re almost done! Right before that, we’ll go back to step 3: cross the 0s!

And at this moment, we have 3 lines, for 3 rows, so we pass the condition to pair workers and jobs!

Step 5 — Assign Jobs and Workers starting with the line with only one zero! Every time we assign a job with a worker, cross its row and column to make it unavailable.

In the example, we begin with Worker B, as this is the only row with one zero. And if you’re wondering, yes, there will always be only one line with one zero, because we’ve reduced it to make it this way!

Now, we have what we call an optimal assignment! This is the best way to make everybody happy!

As you’ve noticed, we had to follow the algorithm scrupulously. And it worked because we had a perfect nxn matrix. But in many situations, this won’t be the case.

For example, what if we have 5 boxes at step 1, and 6 at step 2? Then, we’d have a particular problem with a new detection! And if at step 2 we’d have only 4, we’d have a lost detection!

The same problem can happen in Sensor Fusion, where a LiDAR can output 7 boxes with lots of noise, while a camera would only output 3.

In this example, I also showed the case where we have a minimization problem. We want to match workers with jobs based on the lowest path possible. But in self-driving cars, what would be the costs?

BONUS: Join my Daily Emails and Receive the Deep Learning Engineer MindMap

## How to fill the Cost Matrix?

Let’s take a look at a few possible costs when dealing with bounding box matching:

### Euclidean Distance

First, we could assign bounding boxes to their closest centers by calculating the euclidean distance. Something like: d = √[(x2 – x1)2 + (y2 – y1)2]

It would perfectly solve the problem for simple use cases, but couldn’t deal with overlapping obstacles, or with changing shapes.

### Intersection Over Union (IOU)

In the second suggestion, we could apply the Intersection Over Union. It’s a metric used to calculate how much overlap there are between two bounding boxes. Therefore, a cyclist wouldn’t match a car!

The difference here is that we’ll need to change our problem to a maximization problem: this time, we want to match the highest IOU scores and not the shortest distances!

### Convolutional Cost (Visual Similarity / Dissimilarity)

That second suggestion works most of the time, but has issues with overlapping objects, occlusions, and with matching objects where the boxes fit, but not the image inside! This is why we could run what’s inside the objects through a convolutional neural network, extract the features, and then calculate a cost based on how similar they look.

This can be done using Siamese Networks, calculating the distances between two patchs of images.

In this case, we’re solving a problem called Deep SORT — Adding deep features to our original tracking algorithm.

Here's an example of Deep SORT running:

### A custom Cost Function

Depending on the problem, you want to be inventive with the matrix you have, and any suggestion that makes sense can help. The goal is to fill a matrix with costs. What's something else you could do? combine all costs! For example, combining IOU with Convolutional Costs! Or add other costs based on the bounding box shape together. You could add the class of the elements to the costs. In other words:

You can design your own cost function, and convert that into a bipartite graph, and then use the Hungarian Algorithm on your problem.

## How to deal with the "Edge Cases"?

Now, another problem remains to be solved: How do we deal with edge cases? What if I have an NxM matrix? Or what if I have a maximization problem and want to match the highest IOUs? Let's take a look:

### From NxN to NxM

Earlier, I also raised a question about what happens when we have 3 obstacles in image T, and 4 in image T+1? The trick here is conversion: take the maximum of the entire graph, and add a new column full of this value. If your maximum is 90, add a column full of 90. Put differently, we want to add new edges to our graph.

Let’s say we’re tracking obstacles and using IOU as a metric. On the detection side (time t), we have 4 obstacles. But on the tracking side (time t-1), we only had 3: a new detection has arrived.

Applying the conversion trick would give us the following:

### From Maximization to Minimization

The second problem is maximization. If you're having IOU values, you want the highest values to match. If you have similarity costs, you want the most similar examples to match. Here, the trick is to take the maximum value, and then remove the value of each cell to that maximum.

In the following example, the value of every cell on the left has been subtracted to 90, the maximum.

If you take the Detection 4, Tracking 1 — The cost is 10 —— the IOU is low. To convert that into a big distance, we’ll remove 10 from 90; and the value becomes 80. The big values such as 90 now become 0, and 0s become 90s. We can then run the optimization problem as before.

## Conclusion

As you can see, the Hungarian Algorithm is a powerful optimization algorithm. We can use it for simple optimization cases such as sending workers to jobs, optimizing graphs, or for more advanced cases, such as self-driving cars.

Here's a short recap of what we've said, and takeaways from this article:

• The Hungarian algorithm is helping you match two sets of elements linked by a cost metric.
• When tracking bounding boxes, the cost can be the IOU, the Euclidean Distance, the Convolutional Cost, or your own cost function.
• The algorithm works in 5 steps: we first reduce the matrix, then cross 0s, and finally reduce again until we can pair elements.
• If you have a maximization problem, such as IOU, you can always turn it into a minimization problem.
• If you have an NxM Matrix, you can make it an NxN matrix by adding columns with the maximum value.
• When coding, you can call the function linear_assignment() from sklearn with a matrix to directly get the algorithm's output.

If you've been reading this far, it's very likely that you're interested in Bounding Box tracking, or Sensor Fusion. And it's also very likely that you're interested in the Hungarian Algorithm, and making computer vision for than just detecting 2D objects.

If you already follow my emails and articles, you know what I think of 2D Object Detection: if there's nothing after this, it's completely useless. Nobody needs to know how many cars we have in an image, but this can be used as a black box to then understand where these cars are moving, and how.

As a Computer Vision Engineer, understanding how to use 2D Bounding Boxes as a black box is a must-have when you want to stand out from the crowd. This is why I have compiled a course where I teach all these concepts.

We start by reviewing object detection, and we then dive into the Hungarian Algorithm. We study how it works in detail, and even with specific cases like with the Deep SORT example I showed earlier. Finally, you'll build your own Deep SORT algorithm, and get to see how we implement everything we've talked about in this article. You'll also get to improve the algorithm in specific cases such as occlusions or missed tracks.

It's one of the most popular courses I've ever released, it's called LEARN OBSTACLE TRACKING: The Master Key to Become a Self-Driving Car Professional, you should check it out.