How LiDAR Object Detection Works
LiDAR is a sensor currently changing the world.
It is integrated in self-driving vehicles, autonomous drones, robots, satellites, rockets, and many more.
This sensor understands the world in 3D using laser beams and measures the time it takes for the laser to hit a target come back. The output is a point cloud. Using these information, we can do LiDAR Detection and find obstacles directly from the Point Clouds in 3D.
It’s a technology harder to understand than the camera because less people have access to it. I had access to it and today will help you understand how LiDAR detection works for self-driving cars.
LiDAR’s underlying technology could power a wide array of use cases, but currently, self-driving vehicles offer us the best method of exploring this technology.
Today, I’ll provide an introduction to LiDAR and explain how it works from the self-driving car perspective. I will then explore how to process the point cloud, detect obstacles using 3D bounding boxes, and segment the drivable area in real-time.
The target goal is something similar to this picture.
📩 Before we start, I invite you to join the mailing list by leaving you email! This is the most efficient way to understand autonomous tech in depth and join the industry faster than anyone else.
LiDAR — A 3D Light Sensor
A LiDAR sensor works with light. LiDAR stands for Light Detection And Ranging. They can detect obstacles up to 300 meters and accurately estimate their positions. In Self-Driving Cars, this is the most accurate sensor for position estimation.
A LiDAR sensor is composed of two parts: laser emission (top) and laser reception (bottom). The emission system works by leveraging layers of laser beams. The more layers, the more accurate the LiDAR will be. On the image above, you can see that the more layers, the bigger the sensor.
Lasers are sent to obstacles and reflect
When these lasers hit an obstacle, they create a set of point clouds. The sensor works with Time of Flight (TOF). Essentially, it measures the time it takes for every laser beam to reflect and come back.
When at very high quality (and price), LiDARs can create rich 3D scenes of an environment. They can emit up to 2 million points every second.
Point clouds represent the world in 3D. The LiDAR sensor gets the exact (X,Y,Z) position of every impact point.
A LiDAR sensor can be solid-state or rotating
In the first case, it will focus its detection on a position and offer a coverage range (90° for example). In the latter case, it will rotate around itself and offer a 360° view. In this case, we want to place it on the roof for better visibility.
LiDARs are rarely used as standalone sensors. They’re often coupled with a camera or a RADAR, in a process called Sensor Fusion.
Please find my Sensor RADAR/LiDAR Sensor Fusion article and my LiDAR Camera Fusion article here.
What are the disadvantages of LiDAR sensors?
LiDARs cannot directly estimate velocities. They need to compute the difference between two consecutive measurements to do so. RADARs, on the other hand, can estimate the velocity thanks to the Doppler effect .
LiDARs don’t work well in bad weather conditions. In cases of fog, the lasers can hit it and muddle the scene. Similar to rain drops or dirt.
LiDARs are cumbersome in terms of size—they can’t be hidden like a camera or a RADAR.
The price of LiDAR, even though it's dropping, is still very high.
What are the advantages of LiDARs?
LiDARs can accurately estimate the position of obstacles. So far, we don’t have more accurate means to do this.
LiDARs work with point clouds. If we see point clouds in front of a vehicle, we can stop even if the obstacle detection system didn’t detect anything. It is a huge security for any customer to know that the vehicle won’t only rely on neural networks and probabilities.
There are two main ways to do “Object Detection” using a LiDAR:
Using Unsupervised Machine Learning techniques
Using 3D Deep Learning
I have courses covering both , but in this article, we’ll take a look at the first approach. In this case, the obstacle detection process that generally happens in 4 steps:
- Point cloud processing
- Point cloud Segmentation
- Obstacle clustering
- Bounding box fitting.
Point Cloud Processing — Voxel Grid
To process point clouds, we can use the most popular library, called PCL (Point Cloud Library) . It’s available in Python, but it makes more sense to use it in C++, as the language is more suited to robotics. It’s also compliant with ROS (Robotic OS).
The PCL library can do most of the computations necessary to detect obstacles, from loading the points to executing algorithms. This library is the computer vision equivalent to OpenCV.
Since the output of LiDAR can easily be 100,000 points per second, we need to use something called a voxel grid to downsample the results.
What is a Voxel Grid?
A voxel grid is a 3D cube that will filter the point cloud by only leaving one point per cube.
The bigger the cube, the lower the final resolution of the point cloud.
In the end, we can downsample our cloud from 100,000 points to only a few thousand.
ROI (Region of Interest)
The second operation we can perform is ROI (region of interest).We’ll simply remove every point that isn’t part of a specific region—for example 10 meters on the side, and 100 meters ahead.
The first task is to load the point cloud and downsample it. Now that we have our point cloud, we can proceed to segmentation, clustering, and bounding box implementation.
3D Segmentation — RANSAC
The segmentation task at hand is to separate the scene from the obstacles in it.
A very popular method used for segmentation is called RANSAC (RANdom Sample Consensus). The goal of the algorithm is to identify outliers in a set of points.
The output of the point cloud is generally representing some shapes. Some shapes represent obstacles, and some simply represent the reflection on the ground. RANSAC’s goal is to identify these points and separate them from the others by fitting a plane or a line.
To fit a line, we could think of a linear regression. But with so many outliers, the regression would try to average the results and miss the line. As opposed to a linear regression, here the algorithm will identify these outliers and won’t fit them.
We can consider this line to be the scene’s target path (i.e. a road), and the outliers to be the obstacles.
How does it work?
The process is as follows:
- Pick 2 points at random
- Fit a linear model to these points
- Calculate the distance from every other point to the fitted line . If the distance is within a defined distance tolerance, we add the point to the list of inliers.
It therefore takes a parameter: distance tolerance.
In the end, the iteration with the most inliers is selected as the model; the rest are outliers. This way, we can consider every inlier to be part of the road, and every outlier to be part of an obstacle.
RANSAC also works in 3D. In this case, a plane between 3 points is at the base of the algorithm. The distance from a point to the plane is then calculated.
Here is the result of a RANSAC algorithm on the point cloud. The red region represents the vehicles, while you have the ground in green.
RANSAC is a very powerful and simple algorithm for point cloud segmentation. It tries to find the points that belong to the same shape, and the points that don’t. Then, it can separate the cloud.
To learn how RANSAC works, and to apply it to real LiDAR data, I invite you to take my Point Clouds Fast Course .
Clustering — Euclidean & KD Tree
The output of RANSAC is a list of obstacle points, and a list of road points . From that, we can define independent clusters for each obstacle.
Clustering is a technique where we separate groups of points by their distances. Consider the previous image above, where we have a few visible obstacles— we need the algorithm to understand by itself that there are in fact multiple cars and set one color per obstacle.
How does it work?
Clustering is a family of machine learning algorithms, including: k-means (the most popular), DBScan, HDBScan, and more
We can simply go with Euclidean clustering and calculate the Euclidean distance between points.
The process is as follows:
- Pick 2 points, a target and a current point
- If the distance between the target and the current point is within a distance tolerance, add the current point to the cluster.
- If not, pick another current point and repeat.
A clustering algorithm takes as input a distance tolerance, a minimum cluster size, and a maximum cluster size. This way, we can filter “ghost obstacles” (one single point cloud at a point in space for no reason) and define a distance for closed obstacles.
Computations & KD Tree
The problem with the above technique is that a LiDAR sensor can output 100,000 point clouds. It would mean 100,000 Euclidean distance calculations. To avoid calculating distances for every single point, we can use a KD tree.
A KD tree is a search algorithm that will sort points by their XY positions in a tree. The general idea — if a point is not within a defined distance tolerance, then the points whose x or y are even bigger surely won’t be within this distance. This way, we don’t have to worry about computations for every single point.
Take this scenario above, where the orange point at the bottom isn’t below the distance tolerance threshold. We can remove every point on the right side of this orange point, because we’re sure they won’t be within the tolerance threshold either. Then we can take another point, calculate the distance, and repeat.
KD trees are great for computations—the number of euclidean calculations are drastically reduced. Coupled with a clustering algorithm, they’re powerful tools to obtain independent obstacles efficiently.
The final objective is to create a 3D bounding box around each cluster.
This part is not particularly difficult, but it makes assumptions about the obstacles’s sizes. Since we did not make any classification, we must fit the bounding boxes to the points.
One algorithm that can help fit bounding boxes is principal component analysis (PCA) .
Using PCA, we can draw a bounding box that corresponds exactly to the point clouds. This will help with parked cars, for example, where the detection is only partial.
By combining all 3 techniques discussed above, we have an obstacle detection algorithm built from LiDAR point clouds!
Here’s the result of my project I ran.
LiDAR is a very powerful and reliable sensor that’s used a lot in robotics. Today, we can go even further and fit point clouds to neural networks working in 3D that output the bounding boxes directly.
For a while, LiDAR technology has been criticised for its cumbersome size and its price, making it an elite sensor.
Recently, Apple announced a LiDAR sensor on its new iPad Pro , which significantly reduces the price barrier to under $1,000.
With the price dropping, it might become accessible even to independent developers.
With LiDAR availability, obtaining the skills to work with this sensor will become a real plus for an engineer! Sensor fusion is also a fascinating topic that only makes sense if you master LiDAR + camera or LiDAR + RADAR detection.
To go further, I developed an online course on Point Cloud and 3D Perception. This course will help you build an obstacle detection system from real-world data.
Join the course here and learn to build these projects and work on Machine Learning for Point Clouds!