<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[ADVANCED ARTICLES FOR CUTTING-EDGE ROBOTICS & AV ENGINEERS]]></title><description><![CDATA[BEGIN YOUR JOURNEY: Access my Daily Emails read by 10,000+ Engineers, and Learn daily how to become a cutting-edge engineer in Computer Vision, Robotics, LiDAR, Tracking and Advanced Deep Learning]]></description><link>https://www.thinkautonomous.ai/blog/</link><image><url>https://www.thinkautonomous.ai/blog/favicon.png</url><title>ADVANCED ARTICLES FOR CUTTING-EDGE ROBOTICS &amp; AV ENGINEERS</title><link>https://www.thinkautonomous.ai/blog/</link></image><generator>Ghost 5.85</generator><lastBuildDate>Mon, 08 Jun 2026 04:14:06 GMT</lastBuildDate><atom:link href="https://www.thinkautonomous.ai/blog/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive]]></title><description><![CDATA[Autonomous agriculture, military, and sometimes self-driving cars use the concept of occupancy grids when they can't detect known objects. This article explores real-world 3D occupancy grid mapping, and their applications in traversability estimation and path planning.]]></description><link>https://www.thinkautonomous.ai/blog/occupancy-grid-mapping/</link><guid isPermaLink="false">699dceea4c5babc054c3890d</guid><category><![CDATA[robotics]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Thu, 04 Jun 2026 13:08:28 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/06/occupancy-grid-mapping.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/occupancy-grid-mapping.jpg" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive"><p><strong>Have you ever doubted your loving partner? </strong>In 1945, a Canadian spy named Max Vatan just received an intel that changed everything. His wife Marianne was suspected of being a German spy. His orders were now to find out if it was true, within 72 hours. If she was guilty, he then had to kill her. This is NOT a true story, but the story of the movie Allied, with Brad Pitt and Marion Cotillard.</p><p><strong>So how do you find out if your wife is a german spy? Max had easy options and complex ones. </strong>He could have her followed. He could interrogate her friends, search the house, pull her files... but that was complex. So instead, he made a simpler choice; let her overhear a phone conversation sharing false intelligence, and see if the enemy acted on it. Unfortunately for him, they did.</p><p><strong>This technique is called the canary trap</strong>. It&apos;s used to identify leaks by feeding slightly different versions of the same false information to different suspects. Whoever&apos;s version shows up in the enemy&apos;s hands is the traitor. There is something I love about it: <u>it&apos;s extremely simple</u>. While many techniques are based on heavy profiling, and probabilities... this one is just relying one one piece of information...</p><p><strong>That simplicity, I think Occupancy Grid Mapping in robotics shares it. </strong>You could build a world with a complex 3D representation, bounding boxes all over the place, HD Maps, and all the complexity... or you could build a 2D grid, and define if you can drive there or not. This is particularly useful in environments where objects aren&apos;t easily learned (military, agriculture, ...), and thus, we need new ways.</p><p>In this article, we&apos;re going to explore the topic of Occupancy Grid via 3 core points:</p><ol><li>What is an Occupancy Map?</li><li>How to Fill an Occupancy Grid Map?</li><li>How to use Occupancy Maps for tasks like Planning &amp; Traversability</li></ol><p>Let&apos;s begin:</p><h2 id="what-is-an-occupancy-map">What is an Occupancy Map?</h2><p>You see all these articles out there? They show you the same basic 2D grid drawn in 1995. Let&apos;s try something different, let&apos;s see...</p><h3 id="off-road-occupancy-the-orad-3d-dataset-mobile-robot-perception">Off-Road Occupancy: The ORAD-3D Dataset (Mobile Robot Perception)</h3><p><strong>I am a big fan of off-road autonomous driving</strong>; anything that doesn&apos;t have clear lane lines and traffic signs, but mud, trees, bushes, lakes, and so on... I&apos;ve worked with companies in the defense space on that topic, and I find it fascinating. One of the most used of all algorithms is Occupancy. So I did something for you, the reader:</p><p><strong>I found a dataset called </strong><a href="https://github.com/chaytonmin/ORAD-3D-Dataset-For-Off-Road-AD"><strong>ORAD-3D</strong></a><strong>, which contains labels for a task called &quot;Occupancy&quot;. </strong>This is RARE in datasets, especially autonomous driving which don&apos;t have that, so I am thinking... wouldn&apos;t you like to understand Occupancy from a practical point of view, with a REAL example from ground robots, rather than reading another explanation of Probabilistic Robotics from Sebastian Thrun?</p><p>Yes! Let&apos;s do this.</p><p>So I downloaded the dataset, and I found that, for every frame, there&apos;s a NumPy file, that, once decompressed, shows a [<strong>24919 x 4]</strong> array:</p><table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>9</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>9</td>
<td>11</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>9</td>
<td>12</td>
<td>0</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>70</td>
<td>13</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>70</td>
<td>14</td>
<td>9</td>
<td>0</td>
</tr>
<tr>
<td>70</td>
<td>14</td>
<td>10</td>
<td>0</td>
</tr>
</tbody>
</table>
<p><strong>How do you make sense of it? </strong>Let me first challenge your intuition; what do you think these 24,919 rows and columns are for?</p><p>When you look at the ORAD-3D dataset paper, it provides an interesting comment; the occupancy grid map was built using the <strong>KISS-ICP algorithm</strong>, and when you dig into the GitHub issues, and try to really understand how it was built, you get that:</p><ul><li>The rows represent the number of <strong>voxels</strong> captured by the LiDAR</li><li>The first <strong>3</strong> columns represent<strong> XYZ</strong> indices</li><li>Column <strong>4</strong> represents the <strong>semantic value</strong> (class/category).</li></ul><p>Have I lost you already? Look, if we read the first row, it looks like this:</p><table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>9</td>
<td>10</td>
<td>0</td>
</tr>
</tbody>
</table>
<p><u>At cell (0, 9, 10), we have an object of category 0.</u></p><p>If we then dig further, we&apos;ll see that:</p><ul><li><strong>Label 0 means non-drivable,</strong> we have 21,333 voxels (85.6%) with this category</li><li><strong>label 1 means drivable</strong>, we have 3,233 voxels (13.0%) with this category</li><li><strong>label 4 and 5 represent non-drivable areas too,</strong> but different objects (trees? rocks?mud?)</li></ul><p>In reality, we can use semantic information there too. This is the first way to explain the occupancy grid; a 3D grid gives us XYZ information, and then the semantic value shows the class. In the visualization above, I showed the grid in 2D.</p><p><strong>But XYZ are cell indices, <u>not real-world distances</u>.</strong> To turn that into meaningful information, we must use grid parameters:</p><ul><li><strong>Grid resolution:</strong> 0.5 m/cell (not super high-resolution maps, but that works)</li><li><strong>Grid Dimensions</strong>: 100 x 100 x 16 cells (x,y,z)</li><li><strong>XYZ Coverage:</strong> X +/-25 m &#x2014;&#x2014; Y 0-50 m forward <em>&#x2014;&#x2014; Z -3 to +5 m</em></li></ul><p>Using THIS, you can turn a cell into a real 3D information. On top of this, the orientation of each voxel is <strong>typically aligned</strong> with the axes of the coordinate system used. For example, in a 2D map, voxels are oriented along the x and y axes.</p><p>So here is my function to visualize:</p><pre><code class="language-python">def build_occupancy(occ_voxels):
    &quot;&quot;&quot;Collapse 3D occupancy voxels to a 2D BEV image (100x100 RGB).
    Iterates over voxels and colors each (x, y) cell: green=drivable, grey=non-drivable.&quot;&quot;&quot;
    occ_bev = np.zeros((100, 100, 3), np.uint8)   # black = unannotated
    for row in occ_voxels:
        x, y, label = int(row[0]), int(row[1]), int(row[3])  # skip Z
        if label == 1:
            occ_bev[y, x] = [30, 200, 80]       # green = drivable
        elif occ_bev[y, x, 1] &lt; 200:
            occ_bev[y, x] = [140, 140, 140]      # grey = non-drivable
    return occ_bev</code></pre><p>In the code above, we build a 100x100 map, that is filled with black pixels. Then, we fill in the occupancy values based on the elements; at cell x= 0, y = 9, we put a grey cell. Do this long enough, and we get this view of the robot&apos;s environment:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/Screenshot-2026-06-04-at-09.21.06--1-.jpg" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="1482" height="654" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/Screenshot-2026-06-04-at-09.21.06--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/06/Screenshot-2026-06-04-at-09.21.06--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/06/Screenshot-2026-06-04-at-09.21.06--1-.jpg 1482w" sizes="(min-width: 720px) 720px"></figure><p><strong>That&apos;s occupancy:</strong> Grid is Drivable, Grey is Non Drivable, Black is Unknown.</p><p><strong>But wait. </strong>We had<strong> </strong>24,919 voxels. Our grid is 100 x 100 x 16, which is 160,000 possible cells. What appened to the 130,000 other cells of our grid? These are the black pixels. Only about 16% of our 100x100x16 grid is filled. <strong>That&apos;s what&apos;s called a Sparse Occupancy Map.</strong></p><p>Here is how it looks like in a video, where we also show the other labels (4 and 5):</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/download-ezgif.com-optimize.gif" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="640" height="360" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/download-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/06/download-ezgif.com-optimize.gif 640w"></figure><h3 id="the-types-of-occupancy-maps">The Types of Occupancy Maps</h3><p>I really like this map, but it&apos;s just ONE example of map. In fact, we saw that:</p><ul><li>It&apos;s a 3D Map (has XYZ cells), but shown in 2D. This means we have <strong>2D vs 3D Occupancy Maps.</strong></li><li>It&apos;s a Semantic Map (has many labels), shown as a Binary Map (drivable or not). This means we have <strong>Binary vs Non Binary Maps</strong>.</li><li>We have a Sparse Map, because the majority of cells are unknown. It means we have <strong>Sparse vs Dense Occupancy Maps</strong></li></ul><p>Can you start seeing the types of maps? Of course, you can imagine there are more categories, as there are in <a href="https://www.thinkautonomous.ai/blog/robot-mapping/"><strong>robotic mapping</strong></a> too, Let&apos;s try a very simple summary map:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/Screenshot-2026-06-04-at-11.26.47--1-.jpg" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="2000" height="906" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/Screenshot-2026-06-04-at-11.26.47--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/06/Screenshot-2026-06-04-at-11.26.47--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/06/Screenshot-2026-06-04-at-11.26.47--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/06/Screenshot-2026-06-04-at-11.26.47--1-.jpg 2000w" sizes="(min-width: 720px) 720px"></figure><p>Wow! It&apos;s the hottest illustration of my entire blog you got here! Now, let&apos;s try and understand how to create that type of map:</p><h2 id="how-to-build-an-occupancy-map">How to Build an Occupancy Map</h2><p><strong>In this section, I am going to keep it simple: we are going to see how to build a <u>2D</u>,<u> Non-Binary Map</u></strong>. Filling a 3D Map is much harder, it&apos;s no longer a canary trap but a giant duck, and it&apos;s often done using Occupancy Networks. I show how in my <a href="https://www.thinkautonomous.ai/blog/occupancy-networks/"><strong>Tesla Occupancy Networks article</strong></a>, and it&apos;s the very advanced way.</p><p>Let&apos;s focus on filling a 2D map with, not 1 or 0, but probabilities. Here is an example of such a map I found online, it does what we want, but also adds the idea of dynamic objects:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/dynamic-occupancy-mapping-ezgif.com-optimize.gif" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="502" height="226"></figure><p><strong>How do you arrive to this? </strong><u>The simplest way is by using LiDAR scans.</u> You are projecting a <a href="https://www.thinkautonomous.ai/blog/point-clouds/"><strong>point cloud</strong></a> over your 2D grid representation, and if you see enough points falling in one cell, it means that cell is occupied. The logical thinking would be: &quot;Do I have more than 3 points in that cell? Yes? Then it&apos;s occupied.&quot;</p><figure class="kg-card kg-image-card kg-card-hascaption"><a href="https://autowarefoundation.github.io/autoware_universe/main/perception/autoware_probabilistic_occupancy_grid_map/laserscan-based-occupancy-grid-map/"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/update_with_pointcloud.svg" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="641" height="481"></a><figcaption><span style="white-space: pre-wrap;">(</span><a href="https://autowarefoundation.github.io/autoware_universe/main/perception/autoware_probabilistic_occupancy_grid_map/laserscan-based-occupancy-grid-map/" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>This is the intuition I want us to work on. Of course, this doesn&apos;t work</strong>. What happens to the floor points? Is it occupied, just because beams hit the floor? What happens to leafs? Or what if we have ghosts? False positives? The simple technique is a good intuition which sometimes is too slow, for shure, and it needs to be improved, for shure &#x1F60E;.</p><p>So let&apos;s see how:</p><h3 id="understanding-bayesian-occupancy-grid-mapping-algorithms">Understanding Bayesian Occupancy Grid Mapping algorithms</h3><p><strong>If you walk past a bus station and see one person standing there, would you say it&apos;s occupied?</strong> It depends, is the person standing? or sitting? Does it look like he&apos;s going to cross the street? Hard to tell. Yet, if the person behaves like it&apos;s waiting, you can definitely mark the station as occupied. And if there&apos;s 30 people? You don&apos;t even need to look at them, station are busy!</p><p><strong>An occupancy grid thinks exactly the same way</strong>. One beam hitting a cell starts building evidence. But it&apos;s not enough to be certain. The more beams that confirm it across consecutive scans, the higher the probability climbs. One hit gives you suspicion. Ten hits give you confidence.</p><p>You want a system that has memory, and that updates itself over time...</p><p>Here is how it works...</p><p>There are really 3 ideas you should know:</p><ol><li><strong>A cell&apos;s belief is stored as log-odds, not 0/1, and not occupancy probability values</strong>. The formula is shown below. The difference is log-odds allow us to go beyond [0...1] range, and thus build more confidence. As intuition:<ul><li>p=0.3 &#x2192; l=&#x2212;0.847 (leaning free)</li><li>p=0.5 &#x2192; l=0 (unknown)</li><li>p=0.7 &#x2192; l=0.847 (leaning occupied)</li><li>p=0.9 &#x2192; l=2.197 (strongly occupied)</li></ul></li></ol><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/47fef92e-90c5-47fe-9027-660f6bf8b537.png" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="526" height="150"></figure><ol start="2"><li><strong>At stage 0, all cells have l = 0. </strong>Every cell starts at p=0.5 (complete uncertainty). Converting to log-odds gives us 0...</li></ol><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/1b776c13-fe66-4a12-904e-eb82c23904c3.png" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="360" height="132"></figure><ol start="3"><li><strong>When a new LiDAR measurement arrives, we apply this update formula:</strong></li></ol><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/700fc1f5-1e4b-44c7-8709-e4b0731618e9.png" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="768" height="234" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/700fc1f5-1e4b-44c7-8709-e4b0731618e9.png 600w, https://www.thinkautonomous.ai/blog/content/images/2026/06/700fc1f5-1e4b-44c7-8709-e4b0731618e9.png 768w" sizes="(min-width: 720px) 720px"></figure><p>It looks scary, but it&apos;s really an unfolded formula.</p><p>In fact, let&apos;s see it via an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/f4477fef-7691-46d4-9191-e921e1dddfc6.png" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="1320" height="560" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/f4477fef-7691-46d4-9191-e921e1dddfc6.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/06/f4477fef-7691-46d4-9191-e921e1dddfc6.png 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/06/f4477fef-7691-46d4-9191-e921e1dddfc6.png 1320w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">(built from </span><a href="https://www.researchgate.net/figure/An-occupancy-grid-map-with-a-realistic-update-heuristic-as-described-in-33-Green_fig2_372114186" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><ul><li>Say we have this cell on the left image, we have no idea if it&apos;s free or occupied. So we set p = 0.5, and l becomes 0 (unknown belief).</li><li>The next frame, we see one point falling in the cell. This increases our probability a little, let&apos;s say 0.7. Our belief moves from 0 to 0.847.</li><li>We could consider 0.847 high enough to be occupied, so we mark it as occupied [here, the colors are terrible, occupied should be red].</li></ul><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/bc989de8-e87b-47d0-8ed0-80e44371d0b5.png" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="1152" height="278" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/bc989de8-e87b-47d0-8ed0-80e44371d0b5.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/06/bc989de8-e87b-47d0-8ed0-80e44371d0b5.png 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/06/bc989de8-e87b-47d0-8ed0-80e44371d0b5.png 1152w" sizes="(min-width: 720px) 720px"></figure><p>This is the perfect way to be &quot;smooth&quot; with our data, to set confidence, steps, and so on... You can also notice a symmetry between hit and miss.</p><h3 id="hit-or-miss-did-i-invent-the-probability-values">Hit or Miss: Did I invent the probability values?</h3><p><strong>You may be wondering... Did I just make up the p = 0.7 number in case there is a point hitting a cell? </strong>Not really. The 0.7 is the hit probability of the inverse sensor model, and the short version of where it comes from has not changed: you set it, you do not measure it. It encodes a single decision, how much you trust one hit, based on the sensor noise.</p><p><strong>Most 3D occupancy mapping runs on OctoMap, so its defaults are the de facto convention</strong>. They pair a hit probability of roughly 0.65 to 0.7 with a miss probability of about 0.4, and they clamp the cell&apos;s probability between 0.12 and 0.97 so it can never reach a hard 0 or 1. Each beam endpoint pushes its cell up by the hit amount, each cell the beam passes through on the way gets pushed down by the miss amount, and both accumulate in log-odds until they reach the clamps</p><p>Now, you could try the exercise with prior occupied cells, or prior free cells.</p><h3 id="example-using-occupancy-grid-code-in-action-probabilistic-robotics">Example: Using Occupancy Grid Code in Action (Probabilistic Robotics)</h3><p>Alright, I&apos;d like to show you, in this more advanced example, how to implement these formulas. Here is me running a rosbag containing an occupancy grid map:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/occupancydemo-ezgif.com-optimize.gif" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="491" height="285"></figure><p><strong>How is it built? </strong>I have found a <strong>Matlab</strong> <strong>code</strong> <strong>online</strong> that does exactly what we just discussed. It can feel a bit complex, so the reason should be (1) look at the images, (2) look at the yellow arrows, and (3) read the code. Here is the algorithm:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-24-at-19.27.56--1-.jpg" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="1488" height="1372" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/02/Screenshot-2026-02-24-at-19.27.56--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/02/Screenshot-2026-02-24-at-19.27.56--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-24-at-19.27.56--1-.jpg 1488w" sizes="(min-width: 720px) 720px"></figure><p><strong>Can you see the line where the occupancy formula is applied?</strong> Can you see how we set the free and occupied space? The whole function also considers whether a beam hits, or just &quot;traverse&quot; a cell. The formula is towards the bottom, an arrow points to it, and hopefully, the illustration helps a little.</p><p>Alright, so this was advanced, but it&apos;s a good introduction! Now let&apos;s see the final idea of this article...</p><h2 id="how-to-use-occupancy-maps-for-robotics-tasks-path-planning-traversability">How to use Occupancy Maps for Robotics Tasks (Path Planning, Traversability, ...)</h2><p>There are 2 core ideas I&apos;d like to explain here, especially since we&apos;re in the topic of &quot;Off Road&quot;. Occupancy is fantastic in 2 use cases:</p><ul><li>Self-Driving Car companies using it, especially with End-To-End Learning</li><li>All ground based mobile robot perception algorithms that use Occupancy to know where to drive</li></ul><p>In off-road, we often have issues, such as &quot;Can I drive on this bush?&quot; or &quot;What do I do if I have no GPS?&quot;. These can be solved by one core idea:</p><h3 id="traversability-estimation">Traversability Estimation</h3><p>But first of, a word for context:<br><br><strong>Last December, I got the opportunity to give a DL seminar to a company named IAI</strong> (Israel Aerospace Industries, one of the big 4 there), and they showed me their &quot;off road&quot; autonomous bulldozers.<br><br>This is where I learned about the concept for the first time, when an engineer shared it to me. I then researched more, and recently, <a href="https://www.thinkautonomous.ai/blog/earthsense/"><strong>EarthSense</strong></a>, the autonomous agriculture robots company I told you about, shared they were also using it. In fact, all robots that drive &quot;off-road&quot; are using it. It&apos;s a PILLAR.<br><br>Off-roads means most of the time no&#xA0;GPS, no traffic sign, light, sometimes no GPU, datasets, or anything you&apos;d normally use in a self-driving car. Even the objects are rocks, cliffs, mountains, bushes or at the very best... barbed wires? You can&apos;t use YOLO there, it makes no sense.<br><br><strong>These robots almost all use <u><em>traversability</em></u> estimation. </strong>Imagine you own a robot driving in an agriculture field... Your robot will drive on grass, mud, terrain, but will sometimes face leafs, corns, or as EarthSense taught me the word... &quot;fronds&quot;. (palm leafs basically) Can you guess what happens if you use a LiDAR, or an occupancy map? Of course, all the fronds will be occupied space. Your robot will stop at every leaf.<br><br><u>What we want is not &quot;leaf&quot; or &quot;grass&quot;, we want traversable or not.</u><br><br><strong>And to know whether you can traverse or not, robotics companies use <u>traversability</u> estimation algorithms.</strong> An example of one running:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/ezgif.com-optimize--12--1.gif" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="800" height="375" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/ezgif.com-optimize--12--1.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/06/ezgif.com-optimize--12--1.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>Notice how, as the grass gets higher, we have red values?</p><p><strong>How is it built?</strong> It&apos;s a formula, based on a few factors, such as:</p><ul><li>Elevation of the points (altitude, ...)</li><li>Slope of the surface (is it steep? flat?)</li><li>Roughness of the terrain (how many points, how close they are)</li><li>Semantics (grass is fine, trees isn&apos;t)</li><li>Occupancy Value (can be used, it&apos;s actually optional - but a good input)</li></ul><p>So you can see, we have a lot of factors helping us determine whether a surface, even though NOT occupied, can be traversed or not.</p><h3 id="a-and-path-planning">A* and Path Planning</h3><p><strong>The second thing you can do with Occupancy Maps is Planning</strong>. After you did any kind of robotic mapping, whether SLAM based or here, occupancy based, you have a map. And in a map, you do things like <a href="https://www.thinkautonomous.ai/blog/motion-planning/">Path Planning</a>. So how do we do that? Basically, we set a goal, and apply algorithms like A*, which find the shortest path through the free space while avoiding obstacles.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/a93b8ea4-6f15-4c6e-b8b1-1dfc6ca8b507.jpg" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="1198" height="429" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/a93b8ea4-6f15-4c6e-b8b1-1dfc6ca8b507.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/06/a93b8ea4-6f15-4c6e-b8b1-1dfc6ca8b507.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/06/a93b8ea4-6f15-4c6e-b8b1-1dfc6ca8b507.jpg 1198w" sizes="(min-width: 720px) 720px"></figure><p>And this is how it looks like when you combine both:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/06/download9-ezgif.com-optimize.gif" class="kg-image" alt="Occupancy Grid Mapping: How Off-Road Ground Robots Decide Where They Can Drive" loading="lazy" width="640" height="240" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/06/download9-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/06/download9-ezgif.com-optimize.gif 640w"></figure><p>Ok we&apos;ve seen a lot, let&apos;s do a summary!</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>Here is a bullet point summary of the article content provided:</p><ul><li><strong>Occupancy Grid Mapping is a simple yet powerful technique for robotic navigation</strong> that uses a grid (2D or 3D) to represent the environment.</li><li><strong>The ORAD-3D dataset provides a real-world example of a 3D occupancy map</strong> with semantic labels indicating drivable and non-drivable areas.</li><li><strong>There are multiple types of occupancy maps; </strong>2D vs 3D, binary vs non-binary, and sparse vs dense.</li><li><strong>Building an occupancy map involves filling grid cells</strong> with probabilities rather than binary values, accounting for sensor uncertainty.</li><li><strong>Bayesian Occupancy Grid Mapping uses log-odds to update per cell occupancy probability values</strong> <strong>over time</strong>, improving confidence with repeated sensor readings. Occupancy grids can be dynamically updated as the robot explores the environment.</li><li><strong>The inverse sensor model assigns hit and miss probabilities to sensor measurements</strong>, influencing occupancy updates.</li><li><strong>Occupancy maps support robotics tasks like traversability estimation and path planning.</strong> Traversability estimation considers factors like elevation, slope, roughness, semantics, and occupancy to determine if terrain is navigable. Algorithms like A* and RRT use occupancy grids to calculate the shortest, collision-free path from a robot&apos;s position to a target.</li><li><strong>Occupancy grids provide a foundation for autonomous navigation</strong> in complex and off-road robotics. Engineers learning End-To-End / AV 2.0 should seriously consider it, and those working on UGV, Robotics, Defense should absolutely learn it.</li></ul><h3 id="next-steps">Next Steps</h3><p>Here are a few next steps for you...</p><ul><li><strong>First, I would recommend reading my </strong><a href="https://www.thinkautonomous.ai/blog/robot-mapping" rel="noreferrer"><strong>Robot Mapping article</strong></a>. Occupancy Maps are ONE of the many types of maps we have in robotics. Seeing them in a global context would be helpful. </li><li><strong>Second, you could also read the </strong><a href="https://www.thinkautonomous.ai/blog/occupancy-networks/" rel="noreferrer"><strong>Tesla Occupancy Networks article</strong></a><strong>. </strong>It provides the Deep Learning version of this one, but applied to Tesla and how THEY do it in 3D.</li><li><strong>Finally, would you like to build this exact ground robot/off-road AV project?</strong> That&apos;d be a great next step. We&apos;re implementing Occupancy, Traversability, A*, and Off-Road Algorithms in <a href="https://www.thinkautonomous.ai/the-edgeneers-land " rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a>; this is my community membership where each month a company teaches you how THEY build self-driving cars. <br><br>On March 2026, EarthSense, an agriculture robotics company, taught up about Off-Road, and the episode came with a workshop on Off-Road. <a href="https://www.thinkautonomous.ai/the-edgeneers-land " rel="noreferrer">You can access it in the annual edition of the membership</a>.</li></ul>]]></content:encoded></item><item><title><![CDATA[Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles]]></title><description><![CDATA[<p><strong>There is an effect I love in Avengers,</strong> it&apos;s to see the heroes struggling movies after movies, to defeat Loki, Ultron, then the Ragnarok... only to find out, when they can finally rest, that Thanos hasn&apos;t even entered the arena yet. Suddenly, the previous fights appear</p>]]></description><link>https://www.thinkautonomous.ai/blog/neolix/</link><guid isPermaLink="false">69dc9e17f5d8a3a7dc78a57f</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Mon, 13 Apr 2026 11:59:56 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/04/neolix.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/04/neolix.jpg" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles"><p><strong>There is an effect I love in Avengers,</strong> it&apos;s to see the heroes struggling movies after movies, to defeat Loki, Ultron, then the Ragnarok... only to find out, when they can finally rest, that Thanos hasn&apos;t even entered the arena yet. Suddenly, the previous fights appear meaningless compared to this whole new boss.</p><p><strong>I observe this exact effect in autonomous delivery and shuttles</strong>. We see companies in the US and Europe going through years of buildup, billions raised, hard-fought regulatory battles, milestone after milestone, celebrating 100,000 miles driven, then 200k, then a growing fleet of 50... 150... 1,000 vehicles... but this appears RIDICULOUS compared to what I&apos;m going to share with you today.</p><p><strong>In China, a company has a fleet of over 17,000 delivery vehicles, </strong>operating 24/7 in 300+ cities over 15 countries. Its name? <strong>NEOLIX</strong>! And it is the uncontested champ of autonomous delivery.</p><p>As you are reading this, they are currently operating their shuttles for airport luggage transfer, automotive parts delivery, environmental services, inspection, cold chain delivery, express delivery, food delivery, grocery retail, and countless more.</p><p><strong>How are they doing? How are they so far ahead?</strong> In this episode, I&apos;d like you to meet 2 members of their team: Casillas and Perry.</p><blockquote><strong>Casillas</strong> <strong>is responsible for AI model deployment and engineering</strong>, focusing on post-processing and parallel computing optimization. He participates in this interview to give us his insights on building an End-To-End model for autonomous delivery. <br><br><strong>Perry</strong> <strong>Pan</strong> is the Head of Communication at Neolix &#x2014;&#xA0;she participated in our call (full version available inside my membership) to give us insights on factory, assembly, and building Neolix.</blockquote><p>I have 2 big insights to share in this episode. Again, the full version is available via the Edgeneer&apos;s Land membership; which itself is private to Think Autonomous clients and owners of my AV 2 Map (free).</p><p>The first insight regards the algorithms used... the second regards the assembly of autonomous delivery vehicles.</p><p>Go On:</p><h2 id="insight-1-the-2-stage-end-to-end-architecture">Insight #1: The &quot;2-Stage&quot; End-To-End Architecture</h2><p>Just a few years ago, Neolix was operating fewer vehicles, all with a &quot;modular&quot; architecture. As Perry Pan, the head of communication, mentioned to me:</p><blockquote>&quot;<strong>Previously, our vehicles relied on high-definition maps</strong>, and even with in-house mapping capabilities and autonomous vehicles that could collect data themselves, the full process of data collection, map production, and validation typically took around <strong>two weeks before a vehicle could go live.</strong> With our latest mapless approach, autonomous driving can be achieved using standard navigation data, which significantly shortens deployment time and also helps avoid some of the data sensitivity issues.&quot;</blockquote><p>This time reduction has been made possible via the move to End-To-End Learning. How did it work? Here is how Casillas describes the transition:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="_NCbab8OuMQ" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=_NCbab8OuMQ">
  <img src="https://i.ytimg.com/vi/_NCbab8OuMQ/hqdefault.jpg" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<p>There are several concepts to unpack from this:</p><ul><li><strong>&quot;Early Fusion BEV&quot;</strong>: Earlier in the interview, Casillas describes how Bird Eye View is the CORE PILLAR that allowed their system to transition from Modular to End-To-End. Without it, E2E would have been <u>impossible</u>. The &quot;early fusion&quot; here describes a &quot;raw data level&quot; fusion process of all cameras and the roof LiDAR. (<a href="https://www.thinkautonomous.ai/blog/early-fusion/" rel="noreferrer">more on Early Fusion here</a>)</li><li><strong>&quot;OD, Occupancy, Lane Detection&quot;</strong>: Casillas describes the core 3 Perception tasks that the Neolix driver is solving: <u>object detection,</u> <u>occupancy prediction</u>, and <u>lane detection</u> &#x2014;&#xA0;all happening in the Bird Eye View space. </li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/04/image--2---2---1-.jpg" class="kg-image" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles" loading="lazy" width="2000" height="1095" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/04/image--2---2---1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/04/image--2---2---1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/04/image--2---2---1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/size/w2400/2026/04/image--2---2---1-.jpg 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Neolix viewer (12 cameras on the left, 9 displayed &#x2014; Bird Eye View output on the right with objects and lanes)</span></figcaption></figure><p><strong>Do you think it&apos;s cool?</strong> If you want to build an End-To-End Architecture, your Perception system must have these 3 tasks. I&apos;m showing how to fit this into a larger scene in my AV2 map, where I explain exactly how these are used and assembled together.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Do you think it&apos;s cool?</strong></b> If you want to build an End-To-End Architecture, your Perception system must have these 3 tasks. I&apos;m showing how to fit this into a larger scene in my AV2 map, where I explain exactly how these are used and assembled together. <a href="https://www.thinkautonomous.ai/av2mindmap " rel="noreferrer">Download it here for free!</a></div></div><ul><li><strong>&quot;Two Stage End-To-End&quot;</strong>: Perhaps the most interesting part is Casillas describing the <u>2-stage end-to-end architecture</u>. Rather than a single network solving autonomous driving, as advertised everywhere, Neolix uses a two-stage approach where:<ul><li>Stage 1 = Perception</li><li>Stage 2 = Planning<br><br>This modular approach is also what is being done by Autoware and others from the industry who are transitioning to End-To-End.</li></ul></li></ul><p>A very similar transition has been done via Tesla and explained in <a href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/" rel="noreferrer">this article</a>:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Breakdown: How Tesla will transition from Modular to End-To-End Deep Learning</div><div class="kg-bookmark-description">It&#x2019;s no secret, Tesla is going to use End-To-End Deep Learning. But how? What will it look like? Will the Occupancy Network and HydraNet stay? Here&#x2019;s a full breakdown&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles"><span class="kg-bookmark-author">ADVANCED ARTICLES FOR CUTTING-EDGE ROBOTICS &amp; AV ENGINEERS</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/tesla-end-to-end.png" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles"></div></a></figure><h2 id="insight-2-chinas-speed">Insight #2: China&apos;s Speed</h2><p>How much time do you think it takes to assemble a fully-functional vehicle? A month? A week? A day? The answer completely shocked me. Here it is explained by Perry:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="wsuYOK3wRfM" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=wsuYOK3wRfM">
  <img src="https://i.ytimg.com/vi/wsuYOK3wRfM/hqdefault.jpg" alt="Neolix AI Deployment Head Explains how they use End-To-End and MASS PRODUCE Autonomous Delivery Shuttles" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<p><strong>When I order a camera or LiDAR sensor from France to any company, whether in Europe, or outside of it, I know I can expect a delivery of a couple weeks.</strong> If the process is really fast, it&apos;ll take at least 4/5 days. This is just one sensor. If I then want to assemble my autonomous car, I need to get all the parts, and assemble them.</p><p><strong>For Neolix, the problem doesn&apos;t exist</strong>. China IS the place where you can find ALL components in one city... the same way you&apos;d go do your shopping. Because of this, companies like Neolix can design and build an autonomous vehicles in a day.</p><blockquote class="kg-blockquote-alt"><strong>For Neolix, the time to produce a self-driving car is <u>10</u> minutes.</strong></blockquote><p><strong>This is absolutely insane</strong>. Being in Europe, I know for a fact that this production speed is simply impossible. I remember spending weeks and over a hundred thousand just for a SINGLE car. Neolix assembles a vehicle in 10 minutes at 1/10 of the cost. With this, they use cutting-edge End-To-End algorithms.</p><p>With these stats in mind... is there even a remote fighting chance for companies in Europe, who currently BANS self-driving car outside of prototypes and experimentations?</p><h2 id="%E2%98%84%EF%B8%8F-go-further-download-the-av2-map">&#x2604;&#xFE0F; Go Further: Download the AV2 Map</h2><p><strong>Interested in Neolix AV 2.0 algorithms?</strong> Our AV2 Algorithms Map shows you the 3 algorithms companies like Neolix, but also Tesla, XPeng, and others implement in their End-To-End pipeline. We&apos;ll explore them, and also expand to what Nvidia is currently doing with Alpamayo and reasoning. </p><p>The AV 2 map is available for free on this page:</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Do you think it&apos;s cool?</strong></b> If you want to build an End-To-End Architecture, your Perception system must have these 3 tasks. I&apos;m showing how to fit this into a larger scene in my AV2 map, where I explain exactly how these are used and assembled together. <a href="https://www.thinkautonomous.ai/av2mindmap " rel="noreferrer">Download it here for free!</a></div></div>]]></content:encoded></item><item><title><![CDATA[EarthSense: How to build a vision based Agriculture Robot with Michael McGuire]]></title><description><![CDATA[In this interview, EarthSense Lead Computer Vision Engineer Michael McGuire teaches us the core algorithms behind their autonomous agriculture robots]]></description><link>https://www.thinkautonomous.ai/blog/earthsense/</link><guid isPermaLink="false">69aea32ce4c508552d3e0150</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Thu, 12 Mar 2026 14:57:08 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/03/earthsense.001.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/03/earthsense.001.jpeg" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire"><p><em>A</em> few weeks ago, I was browsing LinkedIn when I saw an incredible post from an Engineer who was working on autonomous agriculture robots. The post had over 1,000 likes, and was showing the &quot;internal&quot; view of an autonomous agriculture robot, spraying an oil palm field. It was fascinating.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/03/unnamed--12-.gif" class="kg-image" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy" width="448" height="308"></figure><p>How did it work? I had to find out, because I suspected this engineer knew a lot more about agriculture robotics than I did, and probably than most of my readers. So Michael and I got in touch, and we, together, recorded a special episode of this show. Will you learn something from Michael today? 100% guaranteed!</p><p>First, let me give you a brief intro...</p><h2 id="what-is-earthsense-and-how-it-works">What is EarthSense and how it works</h2><p>Meet:</p><blockquote><strong>Michael McGuire</strong><br>Michael started as an intern at <a href="https://www.earthsense.co" rel="noreferrer">EarthSense</a> after graduating from the University of Illinois in the US. He got hired via a DeepSORT project, and he then evolved as a Computer Vision Engineer. 4 years later, he accepted to take in charge the operations in Malaysia, and become the Computer Vision Lead. As of recording this episode, he was just out of a demo on TerraMax, an oil palm robot</blockquote><p>And now, here is how he defines EarthSense, and how it works using something they named the &quot;vanishing point algorithm&quot;.</p><p>If I asked 100 engineers to drive a robot autonomously in an oil palm field using vision only, many would tell me to use Stereo Vision. Some would say Visual SLAM. A few might quote Bird Eye View. But the question of &quot;how do you know where to go&quot; should still remain. Here is how Michael solved it:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="PuWph3cN46g" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=PuWph3cN46g">
  <img src="https://i.ytimg.com/vi/PuWph3cN46g/hqdefault.jpg" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<p>Let&apos;s unpack this short clip, there are 2 big ideas here:</p><ol><li>Navigating in agricultural fields</li><li>The Vanishing Point Algorithm</li></ol><h3 id="navigation-in-structured-agriculture-fields">Navigation in structured agriculture fields</h3><p>The first part I&apos;m interested in is here, when Michael describes the environment they drive in:</p><blockquote class="kg-blockquote-alt">&quot;The core of how our autonomy functions is that we can heavily utilize the fact that the fields are highly <strong>structured</strong>. So an oil palm, [...] they tend to have relatively straightforward rows, predictable row widths, and then predictable lane turns at the end. And so the idea is what you want is a system that is capable of <strong>starting at one corner of the field</strong>, <strong>navigating down a row</strong> in the middle of the row without crashing into anything, and then <strong>stopping at the end</strong>, <strong>turning</strong> the lane, and then <strong>coming back down the next row</strong>. And if you can just do that on repeat, those are effectively the two operations that you need to deploy to any large number of acres, basically.&quot;</blockquote><p>Fascinating, don&apos;t you think? It looks very simple, but the part of &quot;without crashing into anything&quot; actually complexifies it.</p><p><strong>How do you make sure you don&apos;t crash? </strong>Do you use an object detector? Or segmentation? How do you do this since all objects are unknown? Or, do you use an occupancy map? Or freespace detection?</p><p>Here is an illustration provided by EarthSense to explain it in more details:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/03/ScreenRecording2026-03-10at10.27.18-ezgif.com-optimize.gif" class="kg-image" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy" width="480" height="300"></figure><p>Now, I am NOT going to describe these, but instead, I&apos;d like to move to the second part of the clip, which is (I think) the most interesting of them all. It discusses navigation.</p><h3 id="the-vanishing-point-algorithm">The &apos;Vanishing Point&apos; algorithm</h3><p>It starts from this quote:</p><blockquote class="kg-blockquote-alt">&quot;When we&apos;re going down the row the chief algorithm that we rely on is the <strong>lane</strong> <strong>detection</strong> as you&apos;re describing,</blockquote><blockquote class="kg-blockquote-alt">I think Renaissance painters centuries ago figured out that a key way to make paintings look realistic was that vanishing line. So if you&apos;re looking down a tunnel, for example, the lines, the pillars, <strong>they all converge to one vanishing point.</strong> And so, we leveraged that geometry to tell us two pieces of information, one of which is <strong>how far are we from the center</strong> and another of which is <strong>how far are we tilted from the center</strong>.</blockquote><blockquote class="kg-blockquote-alt">And so once you have that information, you can then tell your robot exactly where in the row it needs to travel to relative to where it currently is now.&quot;</blockquote><p>Can you see the idea? It&apos;s all happening here:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/03/output_progressive_b730ebe7-a4a6-4215-b3b1-93939ccffaba-ezgif.com-optimize.gif" class="kg-image" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy" width="410" height="231"></figure><p>The prediction happens not via simple geometry, but using a Deep Neural Network, which can be useful when the vanishing point is not at the center (for example, when turning) &#x2014; this tells you exactly how to turn.</p><p><strong>Of course, it&apos;s more &quot;complex&quot; than it looks </strong>(it always is, isn&apos;t it?). Michael already mentioned the idea of &quot;fronds&quot; (giant palm leafs) covering the camera, and disturbing the vanishing point detection... But there are other ideas, such as <strong><em>traversability</em></strong>, <strong><em>identification</em></strong> of end of a row, <strong><em>mapping</em></strong>, and more...</p><p>Still, the idea of the algorithm is surprisingly simple (and I LOVE simple ideas).</p><p>So these are 2 things we&apos;re learning from Michael in this clip &#x2014; of course, there is a full in-depth interview available to the members of The Edgeneer&apos;s Land, my community membership experience.</p><p>But right now, I would like to leave you with 2 things: A bonus video from Michael, sharing his Top 3 Computer Vision skills&#xA0;&#x2014; and an invite to an event on March 18, where I&apos;ll organize a live session to tell you all about Off-Road Robotics, and the 3 core skills to build in there.</p><h2 id="bonus-video-the-top-3-skills-of-computer-vision-engineers">Bonus Video: The Top 3 Skills of Computer Vision Engineers</h2>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="qeqoxk8mSVM" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=qeqoxk8mSVM">
  <img src="https://i.ytimg.com/vi/qeqoxk8mSVM/hqdefault.jpg" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<h2 id="special-invite-for-readers-of-this-post-the-off-road-robotics-event">Special Invite for readers of this post: The Off-Road Robotics Event</h2><p>If you enjoyed this article, you&apos;re probably interested in learning more about every robotics that goes &quot;off road&quot;. Good News: I will be hosting a LIVE Experimentation of all the algorithms discussed with Michael, PLUS way more on Thursday, March 19! All tickets are FREE - and the experience is unique, never to be repeated again!</p><p>Click &quot;<a href="https://www.thinkautonomous.ai/off-road-demo" rel="noreferrer">Book Your Ticket</a>&quot; below to access it!</p><figure class="kg-card kg-image-card"><a href="https://www.thinkautonomous.ai/off-road-demo"><img src="https://images.clickfunnels.com/cdn-cgi/image/width=1000px,fit=scale-down,f=auto,q=80/https://statics.myclickfunnels.com/workspace/jkOnBQ/image/21219332/file/3f7ee7eedc2239142aa144c66a45b90b.jpg" class="kg-image" alt="EarthSense: How to build a vision based Agriculture Robot with Michael McGuire" loading="lazy" width="1000" height="664"></a></figure>]]></content:encoded></item><item><title><![CDATA[Perciv AI: The Power of RADAR Deep Learning with Andras Palffy]]></title><description><![CDATA[Perciv AI is building Deep Learning for RADAR algorithms. We could call this 4D/3D Deep Learning. I have recently visited their HQ, and in this post, I'm revealing what I learned...]]></description><link>https://www.thinkautonomous.ai/blog/perciv-ai/</link><guid isPermaLink="false">699439e379f2601e412fb625</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 17 Feb 2026 11:43:43 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/02/perciv-ai-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/perciv-ai-1.jpg" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy"><p><strong>Ever done a &quot;house swap&quot;?</strong> Recently, one of my mentors in Canada told me he was swapping homes with someone in the Netherlands. Sounds unreal... Yet it isn&#x2019;t. Platforms like Home Exchange apparently have 100,000+ members doing exactly this.</p><p><strong>House swapping is one of those things that could never have worked a decade ago</strong>. Not because the idea was bad (I think it is, but that&apos;s different), but because trust, norms, and infrastructure weren&#x2019;t there.</p><p>And RADAR Deep Learning follows the same pattern.</p><p><strong>RADAR has existed for over 100 years.</strong> Most RADAR algorithmic is still traditional signal processing. As a result, RADAR engineers have long been a small, almost outcast group of &quot;freaks&quot; (sorry) working on systems few people truly understood.</p><p><strong>Why? Because for decades, RADARs were treated as a secondary sensor</strong>. Too noisy. Too low-resolution. Useful only as an auxiliary input in sensor fusion, under the assumption that <em>even noisy measurements are better than nothing</em>.</p><p>That assumption is now breaking.</p><p><strong>RADARs are moving into a primary sensor role</strong>:</p><ul><li>high-resolution RADARs exist</li><li>imaging 4D RADARs are spreading (<a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer">see my article here</a>)</li><li>And more importantly, DEEP LEARNING is now so capable that processing even noisy point clouds can be done!</li></ul><p><strong>This is why in this episode, I am boarding a train to Rotterdam, </strong>where I am meeting with Andras Palffy from <a href="https://www.perciv.ai" rel="noreferrer">Perciv</a>, a startup focused on RADAR Deep Learning.</p><blockquote><strong>Andras Who?</strong><br>The name is Palffy. Andras Palffy. This machine perception and AI specialist co-founded <strong>Perciv</strong>, a Rotterdam based startup focused on AI for RADARs. He wrote multiple 3D Deep Learning papers, and got his Ph.D at the TU Delft (Netherlands).</blockquote><p>He&apos;s today running Perciv, and I&apos;m going to show you an amazing video of his work...</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="SKMIrKBd7sY" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=SKMIrKBd7sY">
  <img src="https://i.ytimg.com/vi/SKMIrKBd7sY/hqdefault.jpg" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<p>WOW!!! So cool, isn&apos;t it? Now, in this post, I will cover 2 ideas to explore:</p><ol><li>The <strong>process</strong> of Deep Learning for RADARs (how does it work)</li><li>The <strong>applications</strong> you can do when leveraging 4D Deep Learning</li></ol><p>Let&apos;s begin with the process:</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div><h2 id="how-to-make-deep-learning-for-radar-work">How to make Deep Learning for RADAR work</h2><p>Let&apos;s begin with this post showing you a demo of Perciv AI&apos;s algorithm:</p>
<!--kg-card-begin: html-->
<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:ugcPost:7374749794465968129?collapsed=1" height="770" width="504" frameborder="0" allowfullscreen title="Embedded post"></iframe>
<!--kg-card-end: html-->
<p><strong>Can you feel the power? </strong>This video shows object detection, but what&apos;s very interesting is how <em>noisy</em> the input is! The points are &quot;dancing&quot;, unlike most <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noreferrer">LiDAR point clouds</a>, which are much more robust and accurate.</p><p>Yet, RADARs provide direct velocity estimation, via the <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noreferrer">Doppler Effect</a>, making them very interesting sensors to use.</p><p>So how does it work? It&apos;s really 3 steps:</p><ol><li>A RADAR outputs a&#xA0;<u>raw&#xA0;signal</u>.</li><li>This signal is often converted to a 2D or&#xA0;3D&#xA0;<u>point cloud</u>&#xA0;to be processed.</li><li>3D&#xA0;Deep Learning&#xA0;algorithms&#xA0;are working on the point clouds with <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noreferrer">points or voxel approaches</a>, just like for LiDARs.</li></ol><p>Now the interesting element:</p><p><strong>Most traditional RADAR algorithms skip step 2</strong>, because they process the RADAR signal directly (you can see how <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noreferrer">in this article</a>). In the case of Deep Learning, we have the option to either convert to a point cloud OR process the raw signal directly. This means that step 2 (signal &#x2192; point cloud conversion) can be skipped, which avoids losing data during conversion.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg" class="kg-image" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy" width="2000" height="466" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Process from RADAR signal to output</span></figcaption></figure><p><strong>We now get the general idea:</strong> Thanks to Deep Learning, we can make noisy RADAR data useful. The next question is, what exactly can we do?</p><h2 id="applications-of-deep-learning-for-radars-by-perciv">Applications of Deep Learning for RADARs (By Perciv)</h2><p>Here is a 30 second clip I recorded at Perciv going in-depth of the <strong>sensors</strong>, <strong>algorithms</strong>, and <strong>end</strong>-<strong>user</strong> interface.</p><figure class="kg-card kg-video-card kg-width-regular kg-card-hascaption" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">0:26</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            <figcaption><p><span style="white-space: pre-wrap;">What&apos;s possible using Deep RADARs</span></p></figcaption>
        </figure><p><strong>Let&apos;s begin with the sensors</strong>. Did you count how many there were? I see 1 camera, 2 LiDARs, and one RADAR that has 2 views: <u>a point cloud view</u>, and a <u>range-doppler view</u>. If you zoom in, you&apos;ll see that the RADAR point clouds are absolutely chaotic. There is no way you&apos;d make sense of it. </p><p><strong>And yet, when you see the blue part, in the middle of the video, you see what the Deep RADAR algorithms are capable of</strong>. The algorithmic panel is ALL based on the RADAR input only. And notice how awesome they are, we have:</p><ul><li>LiDAR + RADAR Accumulator</li><li>RADAR Heatmap</li><li>Freespace Detection</li><li>3D/4D Object Detection and Perception</li></ul><p>Seriously...</p><blockquote class="kg-blockquote-alt">A freespace detector... on a RADAR!</blockquote><p>This is really impressive, isn&apos;t it? And it&apos;s not ALL, because later on, Perciv AI showed me a side-by-side comparison of SLAM with RADAR and LiDARs. Can you guess which one was superior? </p><p>Here&apos;s the answer:</p><p>While the RADAR Odometry uses the velocity information and can accurately spot moving points, LiDAR doesn&apos;t, and as a result, overshoots!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg" class="kg-image" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy" width="1800" height="919" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">RADAR vs LiDAR Odometry &#x2014;&#xA0;RADAR direct speed provides a superior accuracy</span></figcaption></figure><p>This is a very good example of how Deep Learning for RADAR can be used for advanced applications.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Interested in how it works? Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div><h2 id="summary">Summary</h2><ul><li><strong>Perciv AI builds Deep Learning for RADAR algorithms and they are awesome</strong>. I&apos;ve been following Perciv since 2023, even interviewed them when they were only 3, and their dedication to this field is unmatched.</li><li><strong>In RADAR processing, you can either process raw signal, or convert it to a point cloud</strong> the same way you&apos;d do with LiDARs. A heavier pre-processing step is usually done to reduce noise.</li><li><strong>The RADAR processing pipeline therefore becomes:</strong> signal &#x2192; point cloud &#x2192; 3D Deep Learning algorithms &#x2192; output</li><li><strong>There are many algorithms you can run on RADARs</strong>, from object detection to SLAM. In some cases, RADAR&apos;s velocity information can even provide BETTER results than LiDARs.</li></ul><h2 id="infiltrate-perciv-ai-with-me">Infiltrate Perciv AI with me?</h2><p>The last time I visited Perciv AI, I got a complete tour of their facility, team, 4D Deep RADAR algorithms, and even self-driving car. I got to live as an intern on his first day of a self-driving car startup. </p><p><strong>I&apos;m thinking...Wanna see what it&apos;s like? </strong>I mean, what I&apos;ll record there will obviously be top secret, guarded and accessible ONLY to the Edgeneer&apos;s Land citizens (my community membership)....BUT the show?</p><p><strong>This is a show they just did at IAAA Munich to everybod</strong>y. And I see no reason why everybody shouldn&apos;t discover it. This is why I&apos;m creating a special 2-day Virtual Tour,&#xA0;in which you&apos;ll be able to come with me in Rotterdam, be a fly on the wall, and get to live your first day as a self-driving car intern...You will see things like:</p><ul><li>&#x2705; Their self-driving car&#xA0;&#x2014; if you never saw a self-driving car before, this will be the closest you&apos;ll ever get, we&apos;ll see the sensors, wires, everything </li><li>&#x2705; Their 4D Deep RADAR demo&#xA0;&#x2014; where they will demo their algorithms on me! </li><li>&#x2705; Their RADAR tour &#x2014;&#xA0;where they&apos;ll show you what is a RADAR, and give you a tour of the different types in the market</li><li>&#x2705; The RADAR vs LiDAR SLAM video &#x2014; explaining the differences in Odometry estimation and how to do a clean one using RADARs</li></ul><p>As I said, this is the public stuff you normally CAN&apos;T see unless you physically move to where they are. For 99% of people reading this, this is a unique chance to see it. Interested?</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div>]]></content:encoded></item><item><title><![CDATA[How the Solid-State LiDAR works (and why everyone bets on it)]]></title><description><![CDATA[The LiDAR industry is changing. The 100k$ mechanical LiDAR is gone; and we currently see incredible a solid-state LiDAR mass-produced for 1,000$ or less. How do these new-gen LiDARs work?]]></description><link>https://www.thinkautonomous.ai/blog/solid-state-lidar/</link><guid isPermaLink="false">697a327cd1ce7c5171ff3592</guid><category><![CDATA[lidar]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 28 Jan 2026 16:59:40 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/01/solid-state-lidar.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/solid-state-lidar.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)"><p><strong>In 1607, the Jamestown colony was in a critical situation</strong>. English settlers founded it and declared it their first permanent colony in North America. They arrived with total confidence: they knew how to build a town. So they built wooden houses, palisades, shallow foundations, just the English way. But there was a problem: Jamestown was built on a swamp.</p><p><strong>Within weeks, houses collapsed, mosquitos propagated malaria</strong>, <strong>and the water they were drinking caused fever and poisoning</strong>. Within months, half of the settlers died. Yet, the remaining didn&apos;t figure out a better plan, and too much was already decided. It&apos;s only after enduring famine, diseases, and war with locals that they found the right approach, the one that turned Jamestown into the first american colony.</p><p><strong>Solid-state LiDAR are that final method</strong>. In the LiDAR industry, many have experimented with all sorts of sensors, until mutually agreeing on an &quot;ideal&quot; solution: the solid-state LiDAR. Not only it could reduce cost, but it could also significantly improve the performances.</p><p><strong>In this article, I am going to explain to you what is a solid-state LiDAR</strong>, how do they work, and more importantly, why they&apos;re a better choice than most of the other sensors. To truly understand solid-state, we&apos;ll need to also understand mechanical LiDARs, and all their moving parts.</p><p>This will be our first point...</p><h2 id="the-components-of-a-lidar-sensor">The Components of a LiDAR sensor</h2><p>If you want to understand mechanical and solid-state LiDARs, you&apos;ll first need to see the internal components of a LiDAR. Then, we&apos;ll figure out how to classify a solid-state LiDAR based on how these parts move.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1589" height="1258" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 1589w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The different components that can exist in a LiDAR</span></figcaption></figure><p>I am NOT going to describe these one by one, because I would like to instead show you how they all work together. </p><blockquote>This article shows a classification by scanning system. I have <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noreferrer">a complete article breaking down all the different types of LiDARs here</a>.</blockquote><p>Keep these in mind, and let&apos;s take a look at...</p><h2 id="from-mechanical-to-solid-state-lidar">From Mechanical to Solid-State LiDAR</h2><h3 id="the-mechanical-360%C2%B0-lidar">The Mechanical 360&#xB0; LiDAR</h3><p><strong>Back in 2017, I took my first LiDAR class.</strong> It was featuring a Velodyne 64, which is a mechanical LiDAR (Light Detection And Ranging) that became the most famous LiDAR in the autonomous vehicle industry. At this time, it was costing over 100,000$, and promised to transform several use cases (indoor, outdoor robotics, SLAM, ...).</p><p>The principle of this LiDAR is simple; multiple lasers are stacked vertically on mechanical rotating components that spin really fast.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/640-4-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="550" height="309"><figcaption><span style="white-space: pre-wrap;">Fantastic animation from Hesai LiDARs (</span><a href="https://www.thinkautonomous.ai/blog/loxo/" rel="noreferrer"><span style="white-space: pre-wrap;">source, recommended</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>From here, you start identifying the advantages</strong> (accuracy, 360&#xB0;), but also the drawbacks: it&apos;s terribly <u>costly</u> (100k or so in 2017), and better 3D requires more channels - <u>hence more lasers</u> (bigger sensors).</p><p>This is how we started introducing the second types...</p><h3 id="the-mechanical-mirror-lidars">The Mechanical Mirror LiDARs</h3><p><strong>In this evolution, we no longer rotate the entire sensor, nor use multiple laser pulses, but instead, use mirrors and polygons. </strong>Here is an animation explaining how the next 2 work, that I found in <a href="https://www.youtube.com/watch?v=3EehCU3csJQ" rel="noopener noreferrer">this fantastic video again from Hesai</a>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Left: A single laser is sent to a mirror which sends it to a polygon. Right: Several lasers are sent to a 1D mirror.</span></figcaption></figure><ul><li><strong>1D Rotating Mirror</strong>: <strong>The first alternative could be a single mirror that deflects the laser.</strong> Think about it, this is genius! We can use a mirror that spins horizontally to recreate that 3D shape. Of course, we&apos;d need multiple lasers stacked, but we fix the problem of having a rotating platform, which can break.</li><li><strong>Polygon-Mirror: Another alternative is to use ONE laser, and deflect it via the use of mirrors and polygons</strong>. In this case, the mirror swings vertically, and the polygon spins horizontally. This creates a 3D representation, which is narrower, can&apos;t spin 360&#xB0;, but produces a functional point cloud.</li></ul><p>These two are great, but still require you to use polygons and mirrors. In a way, it&apos;s still mechanical. So let&apos;s now talk about the true definition of solid-state...</p><h3 id="solid-state-lidars-no-moving-parts">Solid-State LiDARs = &quot;No Moving Parts&quot;</h3><p>The first time I learned about it was around 2021 when a company asked me to help them choose between multiple LiDARs. At the time, solid-state technology was emerging, and many were saying it was the future of self-driving cars. The definition was repeated by everyone everywhere;</p><blockquote class="kg-blockquote-alt"><strong>&quot;No Moving Parts&quot;</strong></blockquote><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1484" height="754" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 1484w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The purest definition of a solid-state LiDAR is that it has no moving part</span></figcaption></figure><p>Huh. What&apos;s so problematic with moving parts? Is that so terrible? Well, yes, because when used all day for weeks and weeks, these parts will simply... break!</p><p>If we compare solid-state to mechanical LiDARs, we can also see that in 100% of the cases, solid-state is a directional sensor. This means you cannot use it on the roof of your car; <u>you have to orient it very strategically, and you must use several of these sensors if you want a 360&#xB0; view</u>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1588" height="692" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 1588w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">By definition, solid-state LiDARs are directional and can&apos;t rotate to achieve 360&#xB0;</span></figcaption></figure><p>Now, let&apos;s try to understand the differences, and how we can get a 3D point cloud without moving lasers.</p><p>For this, I&apos;ll use the matrix below, which shows the different types of LiDARs based on the components moving. (realize you already covered the first 3 dark rows).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="2000" height="1101" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 2229w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The second part of the matrix: Solid-State is defined by what moves, and how.</span></figcaption></figure><p>Let&apos;s see these, one by one:</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F449;</div><div class="kg-callout-text">The $1,000 LiDAR is here. Do you know what that means for your career as a sensor engineer? I put together a complete skill map for AV engineers covering sensors, perception and the full stack you need to get hired. <a href="https://www.thinkautonomous.ai/sdc-stack"><strong>Get the SDC Engineer Stack here - it&apos;s free.</strong></a></div></div><h4 id="mems-micro-electromechanical-system"><strong>MEMS (Micro-electromechanical system)</strong></h4><p><strong>In a MEMS LiDAR, you&apos;re projecting one laser to a MEMS mirror that oscillates both horizontally and vertically.</strong> It mimics the LiDAR + mirror rotation, but it&apos;s now an oscillation at the micro level.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">You can learn more about this on </span><a href="https://www.youtube.com/watch?v=g7gHm-38t_s" target="_blank" rel="noopener noreferrer"><span style="white-space: pre-wrap;">the Fraunhofer IPMS video where this animation is from</span></a><span style="white-space: pre-wrap;">.</span></figcaption></figure><p><strong>MEMS mirrors still move, so MEMS LiDARs are not &quot;true&quot; solid-state</strong>. Yet, they are excellent alternatives to the mirrors, more resistant to vibrations, and shocks. When looking in more details, LiDAR makes either use a 2D MEMS mirror, or two 1D MEMS Mirror, oscillating horizontally and vertically.</p><h4 id="opa-optical-phased-array"><strong>OPA (Optical Phased Array)</strong></h4><p><strong>What is a LiDAR?</strong> It&apos;s a device that sends a <u>light wave</u>. Correct? Well, a light wave is a... wave. Yes? And a wave is something we understand. It has an amplitude, a phase, a frequency, and a wavelength! In an OPA LiDAR, we use a <u>phase shifter</u> to electronically steer the light wave. This sounds crazy, but it works. This is really modern, new generation, and a &quot;true&quot; solid-state system, since no part is moving.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="640" height="283" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif 640w"><figcaption><span style="white-space: pre-wrap;">OPA LiDAR (</span><a href="https://www.youtube.com/watch?v=xEqV879qDNE" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="flash-lidars"><strong>Flash</strong> <strong>LiDARs</strong></h4><p><strong>In a Flash LiDAR, a diffuser projects a wide, diffused laser illumination which comes back to an array detector,</strong> creating a full 3D image in a single exposure. <u>This is a non-scanning technology; everything is illuminated at once</u>.</p><p>Was that clear? Well, imagine being in the dark, and trying to illuminate the room.</p><ul><li>You can either agitate a red laser all over the place (scanning devices - MEMS, OPA, ...)</li><li>Or you can use a torch, which instantly illuminates the room.</li></ul><p><strong>This is what a Flash LiDAR does</strong>,<strong> it&apos;s a laser torch.</strong></p><h4 id="solid-state-summary">Solid-State Summary</h4><p>Cool, a quick summary of the last 3?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1790" height="654" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1790w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The different types of solid-state LiDARs</span></figcaption></figure><p>We now have a good understanding of Solid-State. The question I want to continue with is...</p><h2 id="how-is-solid-state-better-than-mechanical-lidar-technology">How is Solid-State better than Mechanical LiDAR technology?</h2><p>There are several aspects that you can already guess, but I&apos;d like to take these one by one anyway.</p><h3 id="better-durability-no-moving-parts"><strong>Better durability (no moving parts)</strong></h3><p><strong>Mechanical LiDARs <u>have moving parts</u></strong>, which wear out over time and increase the risk of failure in automotive environments (vibration, heat, dust). This risk is real for MEMS (which we saw is partly mechanical), but completely reduced for OPAs and Flash LiDARs. <u>The #1 advantage of using a solid-state LiDAR is this.</u></p><h3 id="compact-lightweight-design">Compact &amp; lightweight Design</h3><p><strong>A mechanical LiDAR HAS to be on the roof of a vehicle. </strong>This is not only ugly, but also impractical. On the other hand, a solid-state LiDAR can be nicely integrated in the front of a vehicle. This makes Mechanical LiDAR not such a good option. When you look at the ADAS (Advanced Driver Assistance System) industry, most companies like BMW, Mercedes-Benz, etc... include MEMS LiDARs in the front. Its small size makes it ideal for integration into space-constrained platforms like drones and autonomous vehicles.</p><p>Let&apos;s continue:</p><h3 id="mass-production-capability">Mass Production Capability</h3><p><strong>Manufactured using semiconductor processes</strong>, solid-state LiDARs can be mass produced with lower costs. MEMS are currently the cheapest, but OPAs promise to reach incredible costs (100$ or less). The math makes sense, we got lower size and lower cost, which is always the direction we want to go towards in hardware.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/IDTechEx_Lidar_chart.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="500" height="317"><figcaption><span style="white-space: pre-wrap;">The cost of LiDAR based on their types </span><a href="https://www.idtechex.com/en/research-report/lidar-2024-2034/995" rel="noreferrer"><span style="white-space: pre-wrap;">(source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h3 id="point-cloud-resolution-high-performance">Point Cloud Resolution &amp; High Performance</h3><p><strong>A mechanical LiDAR solution based on spinning mechanics often provides sparser point clouds</strong>, especially vertically, with gaps in coverage compared to dense sensors like cameras. This can lead to blind spots for low or small obstacles. On the other hand, a solid-state LiDAR can capture hundreds of thousands of points per second, and has a higher angular resolution, which is very good for tasks like 3D mapping or obstacle detection.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1024" height="342" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 1024w" sizes="(min-width: 720px) 720px"></figure><ul><li>With this, a solid-state LiDAR has lower power consumption (good when using drones for example), could resist environmental conditions better, scan faster, and have a flexible field of view.</li><li>Other than the field of view, the modulation itself is very much manageable; most FMCW (frequency modulated continuous wave) LiDARs are for example based on Solid-State, and NOT mechanical.</li></ul><p><strong>In industries like self-driving cars,</strong> smart cities, industrial automation, robotics, using something with high resolution, high accuracy, good enough distance/range, and potentially a wide field of view makes total sense.</p><h2 id="range-resolution-performance">Range, Resolution, Performance?</h2><p>The following is to take with a pinch of salt, because it varies very often and some companies have crazy claims. Yet, I also looked at studies like <a href="https://www.idtechex.com/en/research-report/lidar-2024-2034/995" rel="noopener noreferrer">this one from IDtechEx</a>, <a href="https://www.mdpi.com/2072-666X/11/5/456" rel="noopener noreferrer">this one on MEMS mirrors</a><strong> </strong>, and <a href="https://onlinelibrary.wiley.com/doi/full/10.1002/lpor.202100511" rel="noopener noreferrer">that one on OPAs</a>. Here is an overview:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1884" height="964" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1884w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Comparing the different types of sensors based on range, field of view, cost, and resolution. It&apos;s highly incomplete, but gives you an idea.</span></figcaption></figure><p>Can you see why MEMS, which even though is not really solid-state is the BEST compromise? It&apos;s the only one that can currently be mass-produced at a low price, while keeping good range and high resolution.</p><p><strong>You can therefore see how MEMS and Mechanical LiDARs are still the ones being used the most in the industry. </strong>True solid-state is a crazy dream, with incredible claims (an OPA LiDAR could reach a cost of 100$). For now, we aren&apos;t there yet.</p><h2 id="example-1-innoviz-technologies">Example 1: Innoviz Technologies</h2><p>At CES 2026, I have explored solid-state LiDARs with Seyond &amp; Innoviz. On the one hand, Seyond that you already saw, is doing Flash LiDARs, which is &quot;true&quot; solid-state. On the other, Innoviz is very likely doing MEMS, which is... hybrid (still following?).</p><p>I would like to start with Innoviz Technologies latest demo:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="JF8rhmANxJM" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=JF8rhmANxJM">
    <img src="https://i.ytimg.com/vi/JF8rhmANxJM/hqdefault.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>Did you see how awesome that looks? Now, you can notice how the benefits here are related to cost reduction, to size shrinking, and heat/power reduction. On the other hand, let&apos;s now see a demo of a Flash LiDAR:</p><h2 id="example-2-seyond-flash-lidars">Example 2: Seyond Flash LiDARs</h2><p>Here is now the second example, where <a href="https://www.seyond.com/" rel="noreferrer">Seyond</a> gives you an amazing overview of a Flash LiDAR (Hummingbird). This video is originally from my membership The Edgeneer&apos;s Land - make sure to <strong>be in my daily emails to learn more</strong>.</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="-71Cb5V3nfI" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=-71Cb5V3nfI">
    <img src="https://i.ytimg.com/vi/-71Cb5V3nfI/hqdefault.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>Alright, let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>Here is a bullet point summary of the article:</p><ul><li><strong>The robotics &amp; LiDAR industry tends to use 2 types of LiDARs</strong>: Mechanical and Solid-state. While the former has moving parts, the later doesn&apos;t.</li><li><strong>Solid-State LiDARs come in 3 categories: </strong>MEMS (with moving mirrors), OPA (true solid-state with no moving parts), and Flash LiDAR (projects laser arrays for instantaneous scene capture). They are all directional, lower power, higher resolution, but shorter range and lower reliability than those with mechanical movement.</li><li><strong>LiDAR technology is about sending a laser</strong> to the world and measuring the time a wave takes to hit a surface and come back. Yet, this can be done via several processes.</li><li><strong>The semiconductor manufacturing process allows solid-state LiDAR to be mass-produced at lower cost,</strong> making it more accessible for automotive and industrial applications.</li><li><strong>Solid-state LiDAR technology is advancing rapidly and is becoming the default choice</strong> for applications requiring high performance, compactness, and reliability, including self-driving cars and smart cities.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F449;</div><div class="kg-callout-text">The $1,000 LiDAR is here. Do you know what that means for your career as a sensor engineer? I put together a complete skill map for AV engineers covering sensors, perception and the full stack you need to get hired. <a href="https://www.thinkautonomous.ai/sdc-stack"><strong>Get the SDC Engineer Stack here - it&apos;s free.</strong></a></div></div>]]></content:encoded></item><item><title><![CDATA[LOXO: How to certify End-To-End algorithms in production with Jonathan Péclat]]></title><description><![CDATA[How do you make end-to-end deep learning algorithms certified in production? When you have no way to grade each block individually? Jonathan Péclat from Loxo explains that to us.]]></description><link>https://www.thinkautonomous.ai/blog/loxo/</link><guid isPermaLink="false">62e120112ee42fb76dbfe4e2</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 20 Jan 2026 10:43:40 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo.jpg" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><p><strong>On June 4, 1996, the Ariane 5 rocket was ready to be launched after years of work</strong>, public funding, and political pressure. The stress was at maximal, but after just 40 seconds, the rocket exploded, causing a loss of over 370M$. <strong> </strong></p><p><strong>This event is one of the most known in software engineering</strong>, and in particular because of the reasons of the crash:&#xA0;<u>a float to int conversion</u>.<strong> </strong>Indeed, the engineers reused the code from Ariane 4 to launch Ariane 5, but forgot that a&#xA0;<em>float64</em>&#xA0;storing the horizontal velocity would be converted converted to a signed&#xA0;<em>int16</em>. 40 seconds into launch, the conversion failed and <strong><em>crashed</em></strong> the rocket.</p><p><strong>I think this story can be a perfect introduction to the domain of autonomous vehicle safety;</strong> which we&apos;ll cover today with our guest Jonathan P&#xE9;clat form Loxo.</p><p>A quick intro:</p><blockquote><a href="https://www.linkedin.com/in/jonathan-p%C3%A9clat-40bb678a/" rel="noopener noreferrer"><strong>Jonathan P&#xE9;clat</strong></a> is the Head of Software Architecture at <a href="https://www.loxo.ch/en/" rel="noopener noreferrer">LOXO</a>. He provided me with fantastic insights on their redundancy approach to make vehicles compliant while using cutting-edge algorithms like End-To-End Deep Planners.</blockquote><p><a href="https://www.loxo.ch/en/" rel="noreferrer">Loxo</a> is a Swiss based company started in 2022 where they built a first prototype for an autonomous shuttle. Since then, it evolved into this vehicle that now operates in Germany &amp; Switzerland.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/54ceae5c-58bd-495c-b66b-bd0675ee59a9.gif" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="479" height="307"><figcaption><span style="white-space: pre-wrap;">Loxo&apos;s autonomous driver in the streets</span></figcaption></figure><p>These robots navigate real streets, interact with real traffic, and do so using an architecture powered by End-to-End Deep Learning.</p><p>I find this incredible, because End-To-End Learning is purely AI based. It&apos;s data based, it&apos;s when you don&apos;t explicitely program the vehicle to stop at red light, but show it via examples from the dataset. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="1438" height="488" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 1438w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Modular vs End-To-End</span></figcaption></figure><p>While a modular approach is pretty straightforward, and certification is about evaluating each individual block (is the lane detection safe? is the obstacle detection safe?)...</p><p>... <strong>End-To-End approaches are much more complex to evaluate</strong>, because they only output the final driving decision. I have <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">an entire article covering the differences here</a>.</p><p>So I asked Jonathan:</p><h3 id="how-do-you-make-end-to-end-learning-safe"><strong>&quot;How do you make End-To-End Learning safe?&quot;</strong></h3><p>Here is what he explained:</p><figure class="kg-card kg-video-card kg-width-regular kg-card-hascaption" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b.mp4" poster="https://img.spacergif.org/v1/1280x720/0a/spacer.png" width="1280" height="720" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:46</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            <figcaption><p><span style="white-space: pre-wrap;"> LOXO uses End-To-End Learning in Production</span></p></figcaption>
        </figure><p>As Jonathan pointed out: </p><blockquote class="kg-blockquote-alt">&#x201C;You cannot really prove that AI is safe, not today. So we run our AI system in parallel with another component that verifies the trajectory. If the AI violates any predefined rule, we switch to a deterministic safe path.&#x201D;</blockquote><p>This point explained is crucial, because several self-driving car companies use exactly the same approach. LOXO does not rely on a single neural network but on <strong>four independent channels</strong> (two AI channels, and two deterministic channels) running in parallel, each serving a different role in verifying, supervising, or backing up the End-to-End planner.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="770" height="303" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png 770w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The LOXO Architecture isn&apos;t just ONE neural network, but 4 separate channels</span></figcaption></figure><p><strong>LOXO&#x2019;s architecture is a clear illustration of the principle of redundancy:</strong> Multiple algorithms, points of view, and, instead of a failure mode, a structure that catches, compensates for, and, if needed, overrides failures.<strong> </strong><u>This is how an End-to-End system becomes certifiable and safer.</u></p><p>The key point to understand is that companies relying on End-To-End do not use just that one approach; they run multiple algorithms in parallel that verify and contradict eachother. <a href="https://www.thinkautonomous.ai/sdc-app" rel="noreferrer"><strong>I have a complete breakdown of how Mobileye does it with their own End-To-End approach here, if you&apos;re interested</strong></a>.</p><p>Still, a question remains: </p><h4 id="what-exactly-do-you-make-redundant"><strong>What exactly do you make redundant?</strong> </h4><p>The sensors? The algorithms? What is even redundancy? This is my next question for Jonathan, which then explains the safety fundamentals of ASIL scoring and decomposition using among other a grading from A (safe) to D (risky):</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil.mp4" poster="https://img.spacergif.org/v1/1280x720/0a/spacer.png" width="1280" height="720" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">2:32</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p>The entire principle relies on the concept of Functional Safety with ASIL Decomposition. This is a job on its own, that often includes ISO norms, but if you&apos;d like to explore this, I have a complete article covering how it works here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/functional-safety/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Functional Safety Engineer: The Job that &#x2018;certifies&#x2019; self-driving cars</div><div class="kg-bookmark-description">What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we&#x2019;ll try to understand how to certify a self-driving car code, and make it safe to drive in the streets</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"></div></a></figure><p><strong>Realize that this doesn&apos;t stop here. </strong>In my interview with Jonathan, Loxo explains the step-by-step framework they implement, along with their internal documents used to grade a function, evaluate its risk, and decide to make it redundant or not.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo-process.001.jpeg" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/loxo-process.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/loxo-process.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/loxo-process.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo-process.001.jpeg 1920w" sizes="(min-width: 720px) 720px"></figure><p>It&apos;s a complete masterclass we have inside <a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer">The Edgeneer&apos;s Land</a>, our community membership experience.</p><p>But for now, let&apos;s do a brief summary:</p><h2 id="summary">Summary</h2><ul><li><strong>When a self-driving car company uses End-To-End Learning</strong>, a single machine-learning model directly maps raw sensor data to driving actions or trajectories; without manually writing any rule.</li><li><strong>While this can simplify system design and improve performance</strong>, it also makes the system harder to interpret, verify, and certify, especially in safety-critical and regulated environments.</li><li><strong>Companies like LOXO often use redundant channels</strong> that are the opposite of End-To-End channels; using point clouds processing, clustering, extraction, and very deterministic approaches to try and validate what the AI says.</li><li><strong>Functional Safety Systems like ASIL Decomposition</strong> still apply to End-To-End, and there are many processes used to certify self-driving car algorithms.</li></ul><p><strong>Next Steps?</strong> <br>If you want to go deeper into how safety is formally addressed in the autonomous driving industry (how risks are identified, graded, reduced, and documented), I detail the full process in this <a href="https://www.thinkautonomous.ai/blog/functional-safety/" rel="noopener noreferrer">blog post</a> about functional safety.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/functional-safety/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Functional Safety Engineer: The Job that &#x2018;certifies&#x2019; self-driving cars</div><div class="kg-bookmark-description">What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we&#x2019;ll try to understand how to certify a self-driving car code, and make it safe to drive in the streets</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"></div></a></figure>]]></content:encoded></item><item><title><![CDATA[LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry]]></title><description><![CDATA[Since the beginning of the self-driving car era, many people wanted to compare LiDAR vs RADAR. It didn't make sense: these sensors were complementary back then. Today, at the age of 4D, the LiDAR vs RADAR comparison makes real sense, let's see...]]></description><link>https://www.thinkautonomous.ai/blog/fmcw-lidars-vs-imaging-radars/</link><guid isPermaLink="false">62a25f550f1a5e26a580b87a</guid><category><![CDATA[lidar]]></category><category><![CDATA[robotics]]></category><category><![CDATA[sensor fusion]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 29 Oct 2025 11:02:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2023/09/lidar-vs-radar--1-.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/lidar-vs-radar--1-.webp" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry"><p><strong>Back in 2020, a company contacted me because they needed my opinion on a robotic sensor stack they were working on.</strong> They had 2 days to finalize the decision of a sensor suite that would equip their autonomous delivery pods. Like many, they were considering using a combination of all sensors, cameras, LiDARs, RADARs, and even ultrasonic sensors. But they also had concerns, and were wondering if nothing better was available.</p><p><strong>But in 2020, the combination of a LiDAR, a camera, and a RADAR was what made the most sense. </strong>&quot;These sensors are complementary&quot; I would reply. &quot;The LiDAR is the most accurate sensor to detect a distance, the camera is best for scene understanding, and the RADAR can see through objects and directly estimate velocities&quot;.</p><p><strong>Is this still true?</strong> Don&apos;t we have camera only systems today? Don&apos;t we have LiDAR only systems that bypass RADARs? And don&apos;t we have RADARs that are getting as good, if not better, as LiDARs? I think the idea of &quot;complementarity&quot; is changing. Today, sensors get more capable. FMCW LiDARs can detect speed, and Imaging RADARs can great accurate point cloud representations.</p><p>So, what is true and what isn&apos;t?</p><p><strong>Let&apos;s take a look via this article in 3 points:</strong></p><ul><li>The Traditional LiDAR vs RADAR comparison</li><li>The new LiDAR and RADAR sensors in self-driving cars</li><li>LiDARs vs RADARs: The Modern Comparison</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Warning Graphic Content: <b><strong style="white-space: pre-wrap;">Ever gutted out a LiDAR?!</strong></b> What does it look like inside? I recorded a video explaining how an emitter and a received works &amp; how it works internally. <br>Watch it <a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-slamtechs-rp-lidar-ungutting" rel="noopener noreferrer">here in my private app.</a></div></div><h2 id="traditional-lidar-and-radar-technology-comparison">Traditional<strong> LiDAR and RADAR technology comparison</strong></h2><p>I believe the following no longer makes sense, but I am going to show it to you anyway, and this is what you&apos;ll see in 99% of other posts about the topic. Here is the idea in 3 points:</p><h3 id="1lidars-are-great-for-distance-estimation">1 - LiDARs are great for distance estimation</h3><p><strong>LiDAR</strong> <strong>(Light Detection and Ranging) is a technology that leverages laser light to measure distances and create detailed 3D maps of objects and environments.</strong> When you look at a distance estimators today, the LiDAR is often used as the &quot;<u>ground truth&quot;</u>. LiDAR systems operate by emitting laser pulses (waves) and calculating the time it takes for the light to come back. This idea is called &quot;Time of Flight&quot; - and although there are multiple <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">types of LiDARs</a>, this is the overall idea.</p><p>Here is an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/tof-lidar.webp" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="800" height="358" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/tof-lidar.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2023/02/tof-lidar.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">How a Time Of Flight LiDAR works</span></figcaption></figure><p>Now, what does it produce? The answer is a <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noopener noreferrer">point cloud</a> of the environment. But not all point clouds look the same.</p><h4 id="2d-vs-3d-lidars">2D vs 3D LiDARs</h4><p>Because I&apos;m going to talk about 4D LiDARs, I have to explain the idea of a 2D and a 3D LiDAR first. The idea is well explained in my post &quot;<a href="https://www.thinkautonomous.ai/blog/2d-lidar/" rel="noopener noreferrer"><strong>2D LiDARs: Too Weak for Self-Driving Cars?</strong></a>&quot;, in which I explain that LiDARs use vertical &quot;channels&quot; or layers, and that based on the number of channels, you have a more accurate 3D resolution.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1120" height="792" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 1120w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR Resolution depends on the number of channels - 1 layer means your LiDAR only sees in 2D. (</span><a href="https://www.thinkautonomous.ai/blog/2d-lidar/" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="what-more-channels-bring">What more channels bring</h4><p><strong>Lidar utilizes laser pulses to send out laser beams</strong>, measure <u>distances</u>, and create detailed 3D maps. But the drawback is that if you want to measure a velocity, you need to compute the difference between 2 consecutive timestamps. How has the point cloud moved in the last second? At low speed, this is good enough, but at high speed, measuring the differences between 2 frames can mean several meters before braking.</p><p>This is why we also like to combine it with a RADAR. Let&apos;s see it:</p><h3 id="2radars-are-great-velocity-estimators">2 - RADARs are great velocity estimators</h3><p><strong>RADAR stands for Radio Detection And Ranging</strong>. It works by emitting electromagnetic waves that reflect when they meet an obstacle. Unlike cameras or LiDARs, RADAR relies on radio waves that can work under any weather condition, and even see underneath obstacles. They use the &quot;Doppler Effect&quot; to measure the velocity of obstacles<em>.</em></p><p><strong>RADAR technology is very mature </strong>(&gt;100 years old), and is used in various industries, including aviation, where it is crucial for air traffic control, cars, missile detection, and even weather forecasting. <u>However, most RADARs work in 2D.</u> Haaaaa - yes, this is what we got: <strong>X and Y, but no Z</strong>, exactly like a one-channel LiDAR.</p><p>Should I show you the sample point cloud from a RADAR?</p><h4 id="output-from-a-radar-system">Output from a RADAR system</h4><p>But let me show you the real output from a RADAR sensor:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/ezgif.com-gif-to-webp.webp" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="345" height="265"><figcaption><span style="white-space: pre-wrap;">(</span><a href="https://www.youtube.com/watch?v=N_8ONE9WqXw" rel="noopener noreferrer"><u><span class="underline" style="white-space: pre-wrap;">source</span></u></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>I mean, can you tell where there is a vehicle? </strong>Whether we should stop or not? It&apos;s complete garbag&#x2014; wait, if people use it, it&apos;s gotta be useful, right? And yes, it is, because while we only have noisy 2D point cloud, each of these points also provide a 1D velocity information. RADARs tell us whether the points are going away from us, or towards us, and how fast.</p><p>Using Point Clouds Processing, Deep Learning (often trained on LiDAR data), or even <a href="https://thinkautonomous.ai/blog/introduction-to-radar-camera-fusion" rel="noopener noreferrer"><u>RADAR/Camera Fusion</u></a>, we can even get a result like this:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/radarcamera.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/radarcamera.mp4" type="video/mp4">
</video>
<figcaption>A RADAR fused with a camera (<a href="https://www.youtube.com/watch?v=Xk5xbxHTt00" rel="noopener noreferrer"><u>source</u></a>)</figcaption>
</figure>
<!--kg-card-end: html-->
<p>Notice how the yellow dot changes to a green color as soon as the car moves, and how each static object is orange, while moving objects have a color. This is because the RADAR is really good at measuring velocities.</p><h3 id="3lidars-and-radars-are-complementary-and-still-need-eachother">3 - LiDARs and RADARs are complementary and still need eachother</h3><p><strong>As a little summary, I&apos;d say that LiDARs are good</strong>, but most of the time need cameras for context, and at high speed, need RADARs. RADARs are great, but could NOT work as a standalone system. So let&apos;s do a quick overview:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/camera-lidar-radar--1-.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="2000" height="1169" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/camera-lidar-radar--1-.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/02/camera-lidar-radar--1-.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/02/camera-lidar-radar--1-.png 1600w, https://www.thinkautonomous.ai/blog/content/images/size/w2400/2023/02/camera-lidar-radar--1-.png 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Camera vs LiDAR vs RADAR comparison</span></figcaption></figure><p>If you want to be green everywhere, you need to combine all 3. Use the camera for scene understanding, use the RADAR for weather conditions and velocity measurement, and the LiDAR for distance estimation.</p><p><strong>This brings a problem. </strong>A random startup must invest in 3 sensors, co-calibrate 3 sensors, train their team on all these sensor types, and the more sensors we use, the more confusion we risk bringing. You may wonder... can&apos;t we use just one? Or two?</p><p>Let&#x2019;s see how:</p><h2 id="fmcw-lidars-imaging-radars-the-future-of-perception">FMCW LiDARs &amp; Imaging RADARs: The Future of Perception</h2><p><strong>Back in January 2023, I was at CES in Las Vegas for the first time.</strong> It was a big show, really incredible, and while walking there, I met a startup named &apos;Aeva&apos;. Aeva is a LiDAR startup specialized in 4D technology. &quot;What&apos;s 4D?&quot; I asked. It turns out, 4D meat that their LiDARs had the possibility to do direct velocity estimation.</p><p><strong>The next day, I walked to a different area and stumble across a korean startup called bitsensing</strong>. &quot;Bitsensing is creating a 4D Imaging RADAR&quot; said the presentator. I was in shock. It was a normal RADAR, but providing an incredible resolution, with Z-elevation, accurate 3D view, no noise, and still the Doppler velocity measurement.</p><p>It sounded like these startups were working on fixing the weaknesses of classical technologies.</p><p>Let me introduce them to you.</p><h3 id="1fmcw-lidar-frequency-modulated-continuous-wave-lidar-4d-lidar"><strong>1 - FMCW LiDAR (Frequency Modulated Continuous Wave LiDAR): 4D LiDAR</strong></h3><blockquote><em>An FMCW LiDAR (or 4D LiDAR, or Doppler LiDAR) is a LiDAR that can return the depth information, but also <u>directly measure the speed of an object</u>. What happens behind the scenes if they steal the RADAR Doppler Technology and adapt it to a light sensor.</em></blockquote><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F449;</div><div class="kg-callout-text">Want to see a real Deep RADAR software stack in action? Perciv AI is a startup building perception software for imaging RADARs. I got exclusive access to their facility and recorded a virtual tour: their live demo, their RADAR vs LiDAR SLAM pipeline, and a walkthrough of the sensors they process. <a href="https://www.thinkautonomous.ai/perciv-ai"><strong>Get your free Discovery ticket here.</strong></a></div></div><p>Here&apos;s what the startup <strong>Aurora</strong> is doing on LiDARs... notice how moving objects are colored while others aren&apos;t:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/FMCWlidar.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/FMCWlidar.mp4" type="video/mp4">
</video>
<figcaption><a href="https://www.aeva.com">Aeva&apos;s</a> FMCW LiDAR that can estimate velocities and predict trajectories (blue: approaching | red: receding)</figcaption>
</figure>
<!--kg-card-end: html-->
<p><strong>LiDAR uses the Doppler Effect, similarly to the RADAR technology, to get this 4D view</strong>. The main idea can be seen on this image, where we play with the frequency of the returned wave to measure the velocity.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://io.dropinblog.com/uploaded/blogs/34241363/files/radar_11.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1200" height="626"><figcaption><span style="white-space: pre-wrap;">If a wave is reflected at a higher frequency, the the object is approaching. If lower, it&apos;s going away from us. (</span><a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/"><span style="white-space: pre-wrap;">see it on the FMCW LiDAR post</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>The Doppler Effect is exactly about measuring this frequency.</strong> And this has now been adopted in FMCW LiDAR technology, but still with light waves instead of radio waves. I highly recommend to check out my complete post called &quot;<a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/" rel="noopener noreferrer">Understanding the magnificent FMCW LiDAR</a>&quot;.</p><h3 id="2imaging-radar-4d-radar">2 - Imaging RADAR: 4D RADAR</h3><p><strong>In 2024, mobileye, who had been working on their own FMCW LiDAR for years, announced it would be shutting down its entire FMCW LiDAR division to focus on proprietary</strong> <strong>4D Imaging RADAR</strong>. What happened? Why the shift? Well, let&apos;s first try to understand what Imaging RADARs are. I like to call these...</p><blockquote class="kg-blockquote-alt"><strong>RADAR on steroids!</strong></blockquote><p>To understand better how it works, I&apos;d like to show you the bitsensing demo they showed me at CES.</p><h4 id="bitsensing-imaging-radar-demo"><strong>bitsensing Imaging RADAR Demo</strong></h4><p>The Imaging RADAR has an incredible resolution. It provides a very accurate point cloud, that can see through adverse weather conditions, do obstacle detection AND measure velocity directly! Under-the-hood, it uses a set of MIMO antennas to get a much better resolution, range, and precision. We could in fact detect obstacles inside a vehicle, and classify children from parents.</p><p>See the demo:</p>
<!--kg-card-begin: html-->
<iframe src="https://player.vimeo.com/video/807852889?h=dfaf463bd4&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="3363091915"></iframe>
<!--kg-card-end: html-->
<p><strong>Can you notice how similar it looks to the FMCW LiDAR? We have in both cases:</strong></p><ul><li>A 3D Point Cloud</li><li>That can directly measure velocity</li></ul><h4 id="other-examples-from-self-driving-cars">Other Examples from Self-Driving Cars</h4><p>Frankly, many actors from the autonomous driving industry are switching to Imaging RADARs. Mobileye has a great demo, so does Waymo. Let&apos;s see these 2 examples.</p><p>Here&apos;s the Waymo Imaging RADAR Demo:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/ImagingRadar.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/ImagingRadar.mp4" type="video/mp4">
</video>
<figcaption>View of the Waymo&apos;s Imaging RADAR (<a href="https://blog.waymo.com/2021/11/a-fog-blog.html?__s=xxxxxxx" rel="noopener noreferrer"><u>source</u></a>)</figcaption>
</figure>
<!--kg-card-end: html-->
<p>And now Mobileye:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="b3WSAYguMaY" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=b3WSAYguMaY">
    <img src="https://i.ytimg.com/vi/b3WSAYguMaY/hqdefault.jpg" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>See? We are in the middle of a <u>transition</u>... but why are people using Imaging RADARs over FMCW LiDARs? And are they really moving away from LiDARs? Let&apos;s find out in the final point...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Learning from theory is one thing, but opening a LiDAR teaches you more than any diagram ever could: from how the emitter &amp; receiver system works, to how raw points become 3D data.<b><strong style="white-space: pre-wrap;"> Watch how I literally opened a LiDAR </strong></b><a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-slamtechs-rp-lidar-ungutting" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h2 id="lidars-vs-radars-the-modern-comparison">LiDARs vs RADARs: The Modern Comparison</h2><p>There are 2 ideas I&apos;d like to talk about here:</p><ol><li>The Future of RADARs IS Imaging based</li><li>The Future of LiDARs may NOT be FMCW based</li></ol><h3 id="1the-future-of-radars-is-imaging-based">1 - <strong>The Future of RADARs IS Imaging based</strong></h3><p><strong>We have clearly see how a good RADAR system can bring incredible benefits</strong>. We can now do tasks like object detection using purely an imaging RADAR. Recently, we&apos;ve seen Deep Learning models, like the ones from <a href="https://www.perciv.ai" rel="noopener noreferrer"><strong>Perciv AI</strong></a>, work on RADAR data (radar signals, radar point clouds, radar waves, ...) directly.</p><p><strong>Back in the day, any comparison between a LiDAR and a RADAR didn&#x2019;t really make sense</strong> because the sensors were highly complementary. <u>But today, these sensors can be in competition</u>, and if there is one, Imaging RADARs are winning it! If we see the new comparison table now:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="2000" height="1162" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 1600w, https://www.thinkautonomous.ai/blog/content/images/size/w2400/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Camera vs FMCW LiDAR vs Imaging RADAR &#x2014; blue: Improved, red: Worse</span></figcaption></figure><p><strong>We are BLUE almost everywhere, but the cost of Imaging RADAR stays lower than FCMW LiDARs.</strong> In addition to this, the Imaging RADAR can nicely fit under a bumper, since RADAR employs radio waves that go through objects.</p><p><strong>When looking at remote sensing technology, RADAR has always been a great choice</strong>; whether it&apos;s synthetic aperture radar systems in the military field, or environmental monitoring of their radio frequency spectrum, or the recent adoption in autonomous vehicles, RADARs ARE by default a great choice.</p><p><strong>In the self-driving space, RADARs were never good enough to be a standalone</strong>. Have someone ever told you you weren&apos;t good enough? Well, this is a lesson, because you can see a massive adoption and trend of Imaging RADAR - and I believe the future of RADARs is imaging.</p><h3 id="2the-future-of-lidars-is-not-fmcw-based">2 - <strong>The Future of LiDARs IS NOT  FMCW based</strong></h3><p><strong>Now this is the incredible discovery here:</strong> <u>Nobody is abandonning LiDARs for FMCW LiDARs</u>. Self-driving car companies have NOT adopted FMCW LiDAR technology in mass (for now), and I predict they&apos;ll just stick to solid-state.</p><p><strong>Back in 2023, I went to Innoviz Technologies headquarters in Israel</strong>. Innoviz is a LiDAR manufacturing companies providing LiDAR devices to companies like BMW. I asked them: &quot;Why are you NOT building FMCW LiDARs?&quot;. Their answer was that their LiDARs were good enough, and that there was no real need for FMCW. It really surprised me, but I guess they know what they&apos;re talking about. They could solve the drawbacks of LiDARs by building better LiDARs, for example here:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/lidars-evolution.jpg" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1590" height="550" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/lidars-evolution.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/lidars-evolution.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/lidars-evolution.jpg 1590w" sizes="(min-width: 720px) 720px"><figcaption><a href="https://innoviz.tech" rel="noreferrer"><span style="white-space: pre-wrap;">Innoviz Technologies</span></a><span style="white-space: pre-wrap;"> provide incredible resolution in their LiDAR sensors</span></figcaption></figure><p><strong>In many fields, LiDAR sensors are at the core.</strong> We have airborne lidar systems building elevation maps, we have HD Maps built entirely from LiDARs, and even drones equipped with LiDARs today... this technology is here to stay. Plus, today, EVERYONE uses LiDARs! No that was wrong, Tesla doesn&apos;t, and a few others have bet on vision-only... but the majority of startups do, except that they use <u>BETTER LiDARs</u>. Not necessarily 4D, but LiDARs that provide better resolution, focusing more on solid-state technology.</p><p>This is the key message I have for you, and now that we&apos;ve seen it, let&apos;s go through a summary, and see some next steps.</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>LiDAR uses laser light to measure distances and create detailed 3D maps of objects and environments</strong>. Their key strength is distance estimation. They key weakness is weak velocity estimation, and weather conditions.</li><li><strong>RADAR emits radio waves and measures their reflections </strong>to detect objects and calculate their speed, even in bad weather. They key strength is velocity estimation, they key weakness is noise, context, and 3D estimation (most are only 2D).</li><li><strong>Traditional setups combine LiDAR, RADAR, and cameras</strong> because each sensor complements the others&apos; strengths and weaknesses. It&apos;s near unthinkable to use one as a standalone.</li><li><strong>Recently, technologies like 4D FMCW LiDAR and Imaging RADAR have emerged</strong>, offering both high resolution and velocity measurement. FMCW LiDARs use the Doppler effect, and Imaging RADARs use more antennas.</li><li><strong>While the future of RADAR is (I believe) RADAR+Imaging capabilities</strong>, I believe the future of LiDARs may be solid-state based, and not necessarily FMCW/4D based.</li></ul><h3 id="next-steps">Next Steps</h3><ul><li>Learn about the FMCW LiDAR <a href="https://www.thinkautonomous.ai/blog/fmcw-lidar" rel="noopener noreferrer">here</a>.</li><li>Learn about the Imaging RADAR <a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer">here</a>.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text">If you want to learn more about LiDARs and cutting-edge technology, I&apos;m sending emails every day about these technologies, and they&apos;re read by over 10,000 Engineers. You should join the daily emails <a href="https://www.thinkautonomous.ai/private-emails" rel="noopener noreferrer">here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[3 Insights from Autoware's Transition to End-To-End Learning with Samet Kütük]]></title><description><![CDATA[Autoware is transitioning to End-To-End Learning. When? And How exactly will this happen? This is what we'll find out this month, in this exclusive interview with Samet Kukut.]]></description><link>https://www.thinkautonomous.ai/blog/autoware-end-to-end/</link><guid isPermaLink="false">68f67f87bad329532556f144</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Thu, 23 Oct 2025 08:47:17 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-end-to-end.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-end-to-end.jpeg" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><p><strong>Did you ever wonder... why are self-driving cars taking so long to come?</strong> I had that question too when starting, and my first answer came from Sebastian Thrun, godfather of self-driving cars, who talked about reaching 90% of use cases easily, but then finding huge difficulty in going from <strong>90%</strong> to <strong>100%</strong>. Recently, Andrej Karpathy, former Lead of Tesla Autopilot described something similar as &quot;the march of 9s&quot;:</p><blockquote>&quot;When you get a demo and something works 90% of the time, that&apos;s just the first 9 and then you need the second 9 and third 9, fourth 9, fifth 9...&quot;</blockquote><p><strong>This is what&apos;s taking long, but instead of focusing on this, tons of companies lose time focusing on the first 0-90%. </strong>Back in 2017 or so, we were all trying to get to 90%, and for this, we were re-developping all the software, algorithms, and so on... At some point, probably 30 startups were all spending millions developing the exact same algorithms.</p><p><strong>This is when Autoware comes in the play</strong>. Started by <a href="https://tier4.jp/en/" rel="noreferrer">Tier IV</a>, Autoware is an open-source self-driving car software that allows you to achieve the first <strong>9</strong> in just a few weeks. Rather than re-developping yet another version of the same code, you <em>jumpstart</em> from the existing state of the art, and finetune it for your needs.</p><p><strong>This month, our membership </strong><a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a><strong> is welcoming Samet K&#xFC;t&#xFC;k from </strong><a href="https://www.autoware.org" rel="noreferrer"><strong>The Autoware Foundation</strong></a><strong>. </strong></p><blockquote>Samet is currently the Community Advocate and Head of Marketing at the Autoware Foundation. Before that, Samet co-founded a company in Istanbul called <a href="https://www.leodrive.ai/" rel="noreferrer">Leo Drive</a>, where he worked for a decade on implementing Autoware in various vehicle platforms, including retrofitting a Volkswagen Golf for autonomous operation. <br><br><strong>Now based in Zurich, he is fully dedicated to the Autoware Foundation</strong>, focusing on marketing, member recruitment, and participating in technical workgroups, particularly in software-defined vehicles and cloud-native development.</blockquote><p>And let me start with a small snippet about how he defines Autoware:</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:20</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p>Together, we recorded a new Fragment of <a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a>, my community membership experience, in which he takes us through the building and management of Autoware. <strong> </strong>How does it work? How do you build a self-driving car with a full remote team? This is everything Samet teaches in our new fragment...</p><p>In this post, I&apos;d like to give you a small sample of that interview, highlighting a very interesting moment where Samet talked about End-To-End learning...</p><hr><h2 id="3-insights-from-autowares-end-to-end-learning-transition">3 insights from Autoware&apos;s End-To-End Learning Transition</h2><p><strong>Since its creation, Autoware has been implementing a &quot;robotic&quot; architecture,</strong> meaning implementing the traditinal &quot;4 pillars&quot;: Perception &#x2192; Localization &#x2192; Planning &#x2192; Control.</p><p><strong>But recently, Autoware announced a new plan to evolve to an End-To-End architecture,</strong> a single neural network that takes in the input sensor data, and automatically outputs the steering angle and acceleration value. I have a complete article explaining the differences with detailed examples <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">here</a>.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">4 Pillars vs End To End: How to pick an autonomous vehicle architecture</div><div class="kg-bookmark-description">How to design an autonomous vehicle architecture? Should you implement an End-To-End solution, or a more traditional one? Let&#x2019;s see&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/autonomous-vehicle-architecture--1-.webp" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"></div></a></figure><p>So here is the sample I&apos;d like to share:</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:16</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p><strong>As you can see, there is a lot to uncover from just one minute. Let me share 3 highlights from that:</strong></p><ol><li>Level 5 is NOT easy to reach, may not even be possible, which is why Autoware focuses on Level 4+, in which humans, while not asked to takeover the car, could be asked to drive under regions or conditions that aren&apos;t appropriate. </li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-level-4-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1182" height="738" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-level-4-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-level-4-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-level-4-.jpg 1182w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Autoware doesn&apos;t claim to reach Level 5, but a very good Level 4+</span></figcaption></figure><ol start="2"><li>The Transition will NOT be achieved immediately, but rather then result of several steps:<ol><li><strong>Current &#x2014;</strong>&#xA0;Starting from a traditional Robotic stack</li><li><strong>Step 1 &#x2014;</strong>&#xA0;Learned Planning</li><li><strong>Step 2 &#x2014;&#xA0;</strong>Deep Perception &amp; Learned Planning</li><li><strong>Step 3 &#x2014;&#xA0;</strong>Monolythic End-to-End (single network)</li><li><strong>Step 4 </strong>&#x2014; Hybrid End-To-End using a &quot;guardian&quot; for redundancy</li></ol></li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e.gif" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1080" height="608" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-e2e.gif 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-e2e.gif 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e.gif 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Autoware&apos;s 4 Step Transition to End-To-End Learning</span></figcaption></figure><p>This can feel similar to how Tesla did their own transition to End-To-End (which I cover in this article:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Breakdown: How Tesla will transition from Modular to End-To-End Deep Learning</div><div class="kg-bookmark-description">It&#x2019;s no secret, Tesla is going to use End-To-End Deep Learning. But how? What will it look like? Will the Occupancy Network and HydraNet stay? Here&#x2019;s a full breakdown&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/tesla-end-to-end.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"></div></a></figure><p>The difference between Modular and Monolythic End-To-End has been explained in their <a href="http://github.com/tier4/new_planning_framework/wiki" rel="noreferrer">GitHub repository</a> talking about the new planning algorithm:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://media1-production-mightynetworks.imgix.net/asset/f284a2ad-0c52-4342-8e80-1b60665c524d/70581e7ae5e4e7d8.png?ixlib=rails-4.3.1&amp;fm=jpg&amp;q=75&amp;auto=format&amp;w=4096&amp;h=4096&amp;fit=max&amp;impolicy=ResizeCrop&amp;aspect=fit" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1286" height="496"><figcaption><span style="white-space: pre-wrap;">Modular versus Monolythic End-To-End. Originally, everybody tried monolythic, then reverted to modular, and are now trying monolythic again with safety guardians.</span></figcaption></figure><p>Alright, let&apos;s continue with a third and final idea:</p><ol start="3"><li><strong>The algorithms for End-To-End have already been built.</strong></li></ol><p>We are not talking about a distant future, according to Autoware, it&apos;s possible to achieve End-To-End with today&apos;s algorithms, including (but not limited to) <a href="https://autowarefoundation.github.io/autoware_universe/main/perception/autoware_lidar_centerpoint/" rel="noreferrer"><strong>CenterPoint</strong></a> as the 3D Deep Learning algorithm for LiDAR Detection, <a href="https://github.com/autowarefoundation/autoware.privately-owned-vehicles/tree/main/AutoSeg" rel="noreferrer"><strong>AutoSeg</strong></a> as the Foundation Model in Perception, <strong>AutoSteer</strong> and <a href="https://github.com/ZhengYinan-AIR/Diffusion-Planner" rel="noreferrer"><strong>Diffusion Planner</strong></a> for the Learned Planning approaches.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="2000" height="947" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-e2e-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-e2e-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/10/autoware-e2e-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e-.jpg 2176w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Autoware Modular End-To-End Architecture will feature these 4 core algorithms</span></figcaption></figure><p>See? They are already there, and even though 2, 3, or 5 years from now, they may evolve and get replaced, the <strong><em>logic</em></strong> of Modular End-To-End (Step 2) has been implemented.</p><p>For example, the <strong>AutoSeg</strong> algorithm is a &quot;<a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noreferrer">HydraNet</a>&quot; that has a single backbone that split into several heads for lane lines, ego path, free space, segmentation, objects, and 3D. The outputs of these heads are then passed to the deep planner.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/AutoSeg--1-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/AutoSeg--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/AutoSeg--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/10/AutoSeg--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/10/AutoSeg--1-.jpg 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A look at AutoSeg, the HydraNet used by Autoware</span></figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F500;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interested in End-To-End?</strong></b> Autoware has published a detailed PDF about their transition to End-To-End, you can download it on <a href="https://autoware.org/" rel="noreferrer">this page</a>.</div></div><h3 id="summary-next-steps">Summary &amp; Next Steps</h3><ul><li><strong>Autoware is an open source self-driving car organization</strong> that builds a self-driving car software used over the world by thousands of engineers and teams</li><li><strong>Autoware is the solution I recommend</strong> to get started in self-driving cars; rather than building a software from scratch, get Autoware working quickly, and then finetune and customize it for your applications.</li><li><strong>Autoware is transitioning</strong> from a robotic architecture to an End-To-End Learning architecture, and there are 3 highlights from it:<ul><li>It won&apos;t reach Level 5, but a <strong>Level 4+</strong> that can drive almost anywhere</li><li>The transition will happen in <strong>4 steps</strong>, adding planning, perception, then turning into a monolythic architecture, and finally hybrid.</li><li>The algorithms and modular logic have already been implemented and are working, such as <strong>CenterPoint</strong>, <strong>AutoSeg</strong>, or <strong>Diffusion</strong> <strong>Planner</strong>.</li></ul></li></ul><h3 id="next-steps">Next steps</h3><p><strong>Interesting in getting access to our Autoware Fragment? </strong>It&apos;s going to be very cool, and feature several things, such as:</p><ul><li>The Full-Length interview with Samet on Autoware</li><li>An even deeper dive on Autoware&apos;s End-To-End Transition (this was just a 1 minute video - we do it for the full section on End-To-End).</li><li>A complete breakdown on many algorithms used by Autoware, and a near plug &amp; play solution to start running Autoware&apos;s software on your computer by tonight</li></ul>]]></content:encoded></item><item><title><![CDATA[Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know]]></title><description><![CDATA[Let's reveal it all: What are point clouds? What are 3 Ways to create them? How to process them? How do we detect 3D objects inside a point cloud?]]></description><link>https://www.thinkautonomous.ai/blog/point-clouds/</link><guid isPermaLink="false">640f9074fa7e0be47b3d9d33</guid><category><![CDATA[lidar]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Mon, 29 Sep 2025 15:40:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-clouds.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-clouds.jpg" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know"><p><strong>In September 1519, an expedition of five ships and 270 men,</strong> led by Ferdinand Magellan, left Spain to reach the Spice Islands by sailing west. At the time, maps were crude sketches, full of blank spaces, and sometimes decorated with warnings: <em>&#x201C;Here be dragons.&#x201D;</em> Yet Magellan pressed on, steering his fleet into the unknown, through storms and across oceans no European had crossed before.</p><p><strong>The challenge was harsher than anyone had imagined</strong>. Supplies ran out, men starved, and mutiny spread. One ship deserted, another wrecked. After nearly two years, Magellan reached the Philippines, where he was killed in the Battle of Mactan. His fleet, once five strong, was reduced to four&#x2026; then three&#x2026; then two.</p><p><strong>3 years later, only one ship returned to Spain.</strong> The Victoria carried just 18 survivors, but also one of the greatest accomplishments of the time. For the first time, humanity had proof that the Earth could be circumnavigated by sea, a discovery that forever reshaped navigation, trade, and commerce.</p><p><strong>For centuries, people believed the old maps. </strong>They trusted the flat drawings, the empty warnings, the <em>&#x201C;here be dragons&#x201D;</em>. All it took was one expedition to open a new world nobody could see. And today, I believe Computer Vision Engineers live in a similar situation.</p><p><strong>The world provides Computer Vision algorithms</strong>, image processing techniques, 2D object detectors, and segmentation approaches... yet, the world is a sphere, in 3D. And this is why, I think something of much greater importance should be mastered by Computer Vision and ALL robotics/autonomous tech engineers: <strong>Point Clouds</strong>.</p><p><strong>The goal of a point cloud is to create a 3D model</strong>. 3D points are a data representation used today in autonomous vehicles, robotics, AR/VR, and even in everyday objects like unlocking your phone with Face ID.</p><p>So what are point clouds? How do you get them? And how do you process them using AI? These are the 3 things I think most perception engineers should know, that we&apos;ll cover in this article.</p><p>Let&apos;s begin:</p><h2 id="9-examples-of-point-cloud-data">9 Examples of Point Cloud Data</h2><p>A Point Cloud&quot; is a set of points in 3D space &#x2014; a cloud of points. Inside, each point holds the 3D location of a surface in the real world. It can be a person, a wall, a tree, anything. You probably know what a point cloud looks like already, but you may not know the multiple types of point clouds... So let me introduce you to 9 of them!</p><h3 id="xyz-point-clouds">XYZ Point Clouds</h3><p><strong>In an XYZ point cloud, each point has a specific X, Y, and Z value</strong>. You could think of it as the equivalent of a pixel, but in 3D. Rather than just X and Y, we have X, Y, and Z (in most cases, because some point clouds are 2D, see this article).</p><p>Here&apos;s an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/xyz-point-cloud.png" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1528" height="842" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/xyz-point-cloud.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/xyz-point-cloud.png 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/xyz-point-cloud.png 1528w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">In this very basic point cloud, each point contains the X, Y, Z information</span></figcaption></figure><p>See? Each point has an XYZ value. But why are the colors different? Simply here because our visualizer is a gradient based on the height of the point (the Z dimension). The higher the Z value, the more red it&apos;ll be. On the above Waymo video, you could see a different visualization, based on the distance to the vehicle. So this is one type:</p><ul><li>Point clouds can contain the XYZ information</li></ul><p>Next:</p><h3 id="xyz-i-point-clouds">XYZ-I Point Clouds</h3><p>Now, this is just an example, but let me show you something else...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1510" height="862" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 1510w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Intensity is computed by almost every modern LiDARs and can help process the cloud</span></figcaption></figure><p>This is another point cloud, but what do you notice about the colors? Yes, two things:</p><ul><li>It&apos;s all &quot;RED&quot;</li><li>But not all points have exactly the same &quot;red&quot; value. Some are brighter than others</li></ul><p>And this is because here, we are no longer visualizing the distance, but the &quot;intensity&quot; of the points. Point Clouds are often produced by LiDARs that send a ray and measure the time it takes to bounce back. This calculation measures the distance, but not all rays come back equal. Some are blocked by trees, leafs, or surfaces, while others perfectly go through.</p><p>So, we now know another attribute of a point cloud:</p><ul><li>Point clouds can contain the XYZ information</li><li>Point clouds can also hold the intensity information!</li></ul><p>Any other?</p><h3 id="xyz-v-point-clouds">XYZ-V Point Clouds</h3><p>Now, let&apos;s take it one step further, and look at this video:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ezgif.com-resize--1-.webp" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">XYZ-Velocity point clouds are usually produced by FMCW LiDARs</span></figcaption></figure><p><strong>Okay, can you explain what is happening here? </strong>Everything is grey, but the vehicles. So, is that... Class? Labels? Or, wait a minute, why are the forward vehicles in red, the parked cars in grey, and the left approaching vehicles in blue? This is because, this visualization shows not the class but the velocity information!</p><p>This video has been made from <a href="https://www.aeva.com" rel="noreferrer">Aeva</a>, an <a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/" rel="noopener noreferrer">FMCW LiDAR</a> producer &#x2014;&#xA0;and inside, you can see the points receding are in red, and those approaching are in blue. We now know a third possibility!</p><ul><li>Point Clouds can contain XYZ</li><li>Or XYZ-Intensity</li><li>Or XYZ-Velocity</li></ul><h3 id="lets-see-9-types-of-point-clouds">Let&apos;s see 9 types of Point Clouds</h3><p>Are there any more than intensity or velocity? Yes, in fact - each point can contain a lot of information. Let&apos;s see:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-cloud-visualization.001.jpeg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/point-cloud-visualization.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/point-cloud-visualization.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/point-cloud-visualization.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/point-cloud-visualization.001.jpeg 1920w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">How many of these did you know?</span></figcaption></figure><ul><li><strong>Intensity</strong> - how strong the point clouds return signals are</li><li><strong>Range</strong> - the distance of the point, based on X, Y, or Z</li><li><strong>Color</strong> - the RGB color of the points (often for RGB-D cameras or 3D reconstruction)</li><li><strong>Class/Label </strong>- if it&apos;s after an object detector or segmentation tool processed it</li><li><strong>Infrared</strong> - the wavelength of the point cloud signal</li><li><strong>Ring/Channel</strong> - which channel of 3D sensors was used to collect it</li><li><strong>Velocity</strong> - the speed of each point (calculated by RADARs or FMCW LiDARs)</li><li><strong>Reflectivity</strong> - how reflective the surface of the point is</li><li><strong>Temperature</strong> - how hot a point is</li></ul><p>Okay, but concretely, how does it work? Is there a TXT file where we store the points? Kinda, let&apos;s take a look...</p><h3 id="point-cloud-formats-files">Point Cloud Formats &amp; Files</h3><p>There are usually two types of files: ASCII and Binary. One is easier to read, the other is more suited to real-time/embedded. Take a look at the beginning of both files:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1726" height="730" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1600w, https://www.thinkautonomous.ai/blog/content/images/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1726w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Two types of files: ASCII and Binary</span></figcaption></figure><p><strong>See? On the left, the PLY file contains X, Y, Z as floats, followed by a list of point coordinates.</strong> This is the point cloud! On the right, you can see a header describing the point cloud, in format XYZ-Intensity, and then, the points are not readable.</p><h2 id="2-main-ways-to-create-a-point-cloud">2 Main Ways to create a point cloud</h2><p><strong>There are basically 2 types of approaches, <u>active</u> and <u>passive</u></strong>. Active techniques actively emit signals like light or sound to measure distances and create point clouds, such as LiDAR and structured light systems. In contrast, passive techniques rely on capturing existing environmental data, like photogrammetry, which reconstructs 3D points from multiple camera images without emitting any signals.</p><h3 id="active-techniques-lidars-rgb-d-radars">Active Techniques: LiDARs, RGB-D &amp; RADARs</h3><p>In the first case, point clouds come from sensors built to create them. When a camera takes a picture, it aims to get pixels. Well, when a LiDAR makes a measurement, its aim is to create a point cloud. Let&apos;s see 3 ways to do it:</p><h4 id="1-how-to-get-point-clouds-using-structured-light-rgb-d-systems">1) How to get point clouds using Structured Light RGB-D systems</h4><p><strong>Ever played the Microsoft Kinect? I can&apos;t say that I have. </strong>I was a Wii player all the way when they were competing. Yet, I&apos;ve always been impressed by how the Kinect produced point clouds using its RGB-D camera, working with the <strong><u>Structured Light Principle.</u></strong></p><p><strong>The Kinect shines a special pattern of light around,</strong> then uses an infrared camera to take a picture of how that light bounces back. By seeing how the pattern changes, the camera can figure out how far away things are. It combines this distance information with the colors it sees to create a 3D image called one final point cloud.</p><p>In robotics, you probably know the Intel Realsense 435i, or other equivalents. Their goal is to build a Depth Map, then turned into a point cloud.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1134" height="744" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 1134w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">What an RGB-D camera produces</span></figcaption></figure><h4 id="2-how-lidar-point-clouds-are-produced">2) How LiDAR point clouds are produced</h4><p><strong>The most common and popular technique is to use a LiDAR (Light Detection And Ranging). </strong>There are many types of LiDARs around, but let&apos;s focus on the simple <u>Time-Of-Flight principle</u>. In this setup, a laser scanner sends a light beam and measure the time it takes to reflect and come back to the receiver. Similar to this image:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1536" height="1024" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 1536w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR scanners send a wave and measure the time it takes to come back</span></figcaption></figure><p><strong>LiDAR scanners produce raw data of the world up to 300-400 meters in the automotive industry.</strong> Each scan can generate millions of points in the three dimensional space. I highly recommend checking my article on the <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">types of LiDARs</a> to learn more.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">WAIT</strong></b>! This blog post doesn&apos;t have to be the only thing you read from me. I post daily through<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> my daily emails</a>, and I talk about LiDARs, Computer Vision, and more cutting-edge AI Applications.<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> You can join my emails here.</a></div></div><h4 id="3-radar-point-clouds">3) RADAR Point Clouds</h4><p>The third technique is to use not a LiDAR but a <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noopener noreferrer">RADAR</a> to create the point cloud data. This is not very straightforward to do. RADARs usually return signal information based on Doppler (velocity), Range (distance), and Azimuth (direction/angle). Using these, we can do some calculations to retrieve the point cloud data.</p><p>Here is an example on a very low quality RADAR:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="586" height="247"><figcaption><span style="white-space: pre-wrap;">How RADAR heatmaps get converted to point clouds</span></figcaption></figure><p>Today, we can use Imaging RADARs to get 3D point clouds. I invite you to <a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noopener noreferrer">check out my Imaging RADAR article to learn more about it.</a></p><p>Now that we&apos;ve seen the Active ways, using sensors - I&apos;d like to take a minute to talk about the passive ways.</p><h3 id="passive-point-clouds-generation-photogrammetry-3d-reconstruction">Passive Point Clouds Generation: Photogrammetry &amp; 3D Reconstruction</h3><p><strong>The idea of passive is that you do not attempt to create a point cloud from your sensors. </strong>The main way to do this is by leveraging 3D Reconstruction. Ideas like Structure From Motion, Multi-View Stereo, NeRFs, Gaussian Splatting, or others are used.</p><p>The idea? To convert 2 or more images to a 3D point cloud using triangulation, geometry, and depth maps.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1600" height="756" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/unnamed.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/unnamed.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.jpg 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Stereo Vision is a powerful technique to retrieve 3D models</span></figcaption></figure><p>If you&apos;re interested in this, I highly recommend reading my 3 article Series on <a href="https://pyimagesearch.com/2024/10/14/photogrammetry-explained-from-multi-view-stereo-to-structure-from-motion/" rel="noopener noreferrer">PyImageSearch blog</a>, or my article on Pseudo-LiDARs.</p><p>Alright, so you now know all about the point cloud types, and the ways to get them. One thing remains...</p><h2 id="how-to-process-point-cloud-data">How to Process Point Cloud Data?</h2><p>Do you remember in the point cloud types when I showed the &quot;label/class&quot; of each point? This is not something sensors can measure, it&apos;s built by algorithms. There are 3 things that really matter here:</p><ol><li>Understanding the main libraries/tools to work with</li><li>Understanding the core algorithms to use on raw point cloud data</li><li>Being able to use them in the applications</li></ol><h3 id="libraries-open3d-and-point-cloud-library-pcl">Libraries: Open3D and Point Cloud Library (PCL)</h3><p>There are many libraries used to process point clouds. These implement the algorithms. For example, the Point Cloud Library is one of the most popular to work with. Open3D is also a very common one, it contains fewer algorithms, but is easier to process thanks to the Python interface. I would recommend to get started with this one.</p><p>On a similar topic, you could want to know at least one point cloud dataset. I would recommend you <a href="https://www.thinkautonomous.ai/blog/lidar-datasets/" rel="noopener noreferrer">check out this article</a>.</p><h3 id="which-algorithms-can-be-used-to-process-point-clouds">Which algorithms can be used to process point clouds?</h3><p><strong>In point cloud processing, you can either go with traditional algorithms or 3D Deep Learning.</strong> The split is, I would say, dependent on the applications. When companies want to detect objects in 3D to get bounding boxes, they usually use <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer">3D Deep Learning algorithms</a> like PointPillars or VoxelNet. Let&apos;s see an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ezgif.com-optimize--1-.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="420" height="336"><figcaption><span style="white-space: pre-wrap;">LiDAR Object Detection - </span><a href="https://courses.thinkautonomous.ai/deep-point-clouds" rel="noreferrer"><span style="white-space: pre-wrap;">taken from my Deep Point Clouds course</span></a></figcaption></figure><p><strong>Outside of 3D Object Detection and 3D Segmentation, the entire world runs on traditional processing approaches</strong>. Since you have points, you can create tons of automated pipelines to process them. For example, you can do plane segmentation, clustering, outlier removal, normal estimation, point data cropping, surface reconstruction, filtering of unwanted data points, and so on, you&apos;ll use traditional approaches.</p><p>For example, you could calculate the surface normals and filter out the objects that belong or don&apos;t below to the street.</p><p><strong>Another technique can involve </strong><a href="https://www.thinkautonomous.ai/blog/point-cloud-registration/" rel="noopener noreferrer"><strong>point cloud registration and alignment</strong></a><strong>.</strong> When you have multiple point clouds, for example coming from 2 LiDARs, you can align them together into a single object. An example below from one of my LiDAR courses, notice how we start with 2 point clouds, a blue and a red, and we end up aligning them perfectly. This makes something better than the raw data.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed--1-.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="729" height="355" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/unnamed--1-.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed--1-.gif 729w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR Stitching - </span><a href="https://courses.thinkautonomous.ai/point-clouds" rel="noreferrer"><span style="white-space: pre-wrap;">taken from the DLC of my Point Clouds Conqueror course</span></a></figcaption></figure><p>In the algorithm category, there are countless applications. Everything related to SLAM or Odometry is also extremely in use today.</p><h3 id="applications-which-jobs-can-you-target-with-point-cloud-skills">Applications: Which jobs can you target with Point Cloud skills?</h3><p>Regarding the applications, we could write an entire article. Yet, let me give you 3 or 4 core jobs you can target with point clouds processing skills:</p><ul><li><strong>Perception Engineer, Autonomous Vehicles: </strong>Process LiDARs and RADARs to find objects in the 3D space. Use Sensor Fusion to mix the output with Computer Vision. Build autonomous vehicles, shuttles, delivery robots, and create the future.</li><li><strong>Nuclear SLAM Engineer, Robotics</strong>: Use point clouds processing techniques inside robots that explore caves or regions humans can&apos;t go to, such as nuclear sites, and build maps of the world.</li><li><strong>BIM Engineer, Architecture</strong>: Create digital models of buildings and structure by processing raw point cloud data captured from laser scanners or photogrammetry. These models help architects and engineers visualize object properties, plan renovations, and ensure precise construction. The role often involves using processing software to convert points into computer aided design (CAD) models, performing manual correction to refine the data, and integrating the results into architectural workflows for improved design and quality inspection.</li><li><strong>Medical Imaging Engineer</strong>: Apply point cloud techniques to CT-Scans, IMRs, and other 3D data types to detect diseases and save lives. There is both a commercial and research use.</li><li><strong>Drone Engineer, Agriculture</strong>: Process Cameras and RADARs/LiDAR information to navigate and help drones fasten agriculture and solve population needs.</li><li>and many, many more...</li></ul><p>Alright, now let&apos;s see a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>A point cloud is a series of 2D or 3D points.</strong> A point cloud is to the LiDAR what a pixel is to a camera.<br>(Re-read that one.)</li><li><strong>Each point of a cloud usually contains at least the XYZ information,</strong> but many sensors or technique allow to also get Intensity, Reflectivity, Velocity, Ring/Channel, Color, Temperature, Infrared, and more...</li><li><strong>A point cloud output format is of 2 types: ASCII or Binary.</strong> An ASCII file is more readable for humans, Binary is more readable for robots. Each file is a list of points and their information.</li><li><strong>There are 2 ways to build a point cloud: Active and Passive</strong>. Active techniques involve sensors like LiDARs, RADARs, or RGB-D cameras, while passive techniques use photogrammetry and 3D reconstruction to retrieve 3D models.</li><li><strong>Point Clouds Processing typically involves 3 stages</strong>: the tools/libraries, the algorithms, and the applications. Tools are libraries like Open3D or PCL, algorithms are either traditional or deep learning, and applications go from self-driving cars to robotics, drones, augmented reality, the architecture industry, agriculture, and beyond.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text">If you want to learn more about point clouds, I highly recommend you read my other posts, and <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">join my daily emails</a>, where I often talk about LiDARs, Computer Vision, and more cutting-edge AI Applications.<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> You can read them here.</a></div></div>]]></content:encoded></item><item><title><![CDATA[Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?]]></title><description><![CDATA[Tesla vs Waymo: Is this worth making another comparison? Well, I think they are not really comparable, yes, one of them has a better map to Level 5, and if you'd like my expert opinion on who, I invite you to read!]]></description><link>https://www.thinkautonomous.ai/blog/tesla-vs-waymo-two-opposite-visions/</link><guid isPermaLink="false">62a25f550f1a5e26a580b870</guid><category><![CDATA[startups]]></category><category><![CDATA[self-driving cars]]></category><category><![CDATA[tesla]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 10 Sep 2025 15:57:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-1.jpg" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?"><p><strong>Where do you think humans come from?</strong> Growing up, I wrestled with two conflicting ideas about it. One teacher taught me Darwin&#x2019;s theory of evolution: a gradual process of adaptation, rooted in <u>science</u> but riddled with gaps and errors. The other taught me the Bible&#x2019;s Old Testament: the story of divine creation, in which even though nothing was ever proven false, this isn&apos;t built on any &quot;proof&quot;.</p><p><strong>Since then, I realized they were trying to answer the same question</strong>, but operated in entirely <u>different realms</u>, each with its own logic and purpose. One belonged to science, the other to faith. Making head-to-head comparison was almost meaningless.</p><p><strong>And I think this contradiction also exists in self-driving cars,</strong> especially when opposing 2 giants: Tesla and Waymo. Both seem to chase the same prize of &quot;Level 5&quot; autonomy, but when you look closer, their paths are so distinct they&#x2019;re barely comparable.</p><p><strong>In this article, we&apos;re going to try and understand who has what I call the best &quot;<em>Map to Level 5</em>&quot;</strong>, we&apos;ll take a side-by-side comparison, and I&apos;ll give my opinion on 3 aspects:</p><ul><li>The sensor suite</li><li>The algorithms</li><li>The &quot;map&quot;, meaning strategy, vision, and more...</li></ul><p>Let&apos;s begin with the sensors...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">People often compare Tesla and Waymo using irrelevant criteria such as LiDAR vs camera. I have come up with a comprehensive comparison video using research papers and more generally algorithms. Interested? Click <a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-tesla-vs-waymo-algorithmic-view" rel="noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a>! (In case you do not have an account yet, you can sign up for one or visit <a href="https://www.thinkautonomous.ai/sdc-app">https://www.thinkautonomous.ai/sdc-app</a>)</div></div><h2 id="tesla-vs-waymo-who-has-the-best-sensor-suite">Tesla vs Waymo: Who has the best Sensor Suite?</h2><p><strong>Back when I was studying driverless cars,</strong> it was around 2017, when I heard an interview with Sebastian Thrun, godfather of self-driving cars, talking about who was ahead in the race. I vividly remember his words: &quot;<em>Nissan is doing pretty good, but I think the company who is ahead of everyone is actually Tesla</em>&quot;.</p><p><strong>It was 2017, and I remember feeling surprised by this comment</strong>, because at the time, Tesla only had a light ADAS feature working with mobileye, and companies like Waymo, Mercedes, Nissan, and others seemed to be covered everywhere in the media, have &quot;real&quot; self-driving car abilities, and more potential.</p><p><strong>What about today? </strong>Who is ahead? Closer to remove human drivers? Who has the better vision? The better algorithms? The better sensors? Is it Waymo, or Tesla... or someone else?</p><p><strong>In this first part, I want to answer it from a sensor angle</strong>. And to do so, I&apos;m going to start by a screenshot of a very popular X (tweet?) from Elon Musk about RADARs, LiDARs, and cameras from August 2025.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1406" height="808" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 1406w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Elon Musk&apos;s comment on X (</span><a href="https://x.com/elonmusk/status/1959831831668228450" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>This comment raised an army of furious engineers and fusion experts</strong>, mentioning Kalman Filters, and Redundancy, and <a href="https://www.thinkautonomous.ai/blog/9-types-of-sensor-fusion-algorithms/" rel="noopener noreferrer">Sensor Fusion</a>. So before we dive into the exactness of this comment, I would like to describe what each company is doing...</p><h3 id="waymo-29-cameras-6-radars-5-lidars">Waymo: 29 Cameras, 6 RADARs, 5 LiDARs</h3><p><strong>If you look at a Waymo car, you&apos;re going to see exactly the opposite of Tesla: tons of sensors all over the place</strong>. There are RADAR sensors on the front, side and rear, there&apos;s 29 cameras, and 5 LiDARs. The question we can ask is... &quot;Is Waymo trying to kill a fly with a bazooka?&quot;.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-sensors.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1008" height="567" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/tesla-vs-waymo-sensors.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/tesla-vs-waymo-sensors.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-sensors.jpg 1008w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s sensor stack</span></figcaption></figure><p><strong>Just in terms of calibration, it must be an absolute <u>nightmare</u> for engineers.</strong> Calibrating a camera with a LiDAR is already a long task, but 29 cameras with 5 LiDARs? Then there is the fusion of all of these sensors together, and this is ONLY the &quot;robotaxi&quot; version, because if you look at their Zeekr shuttles, they also have their own types of sensors, with different generation codes. Their stack is therefore always evolving, and depends on the vehicle they drive on.</p><h4 id="what-type-of-lidar-camera-and-radar-is-waymo-using">What type of LiDAR, Camera, and RADAR is Waymo using?</h4><p>Let&apos;s take a brief look at what each sensor sees:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="640" height="226" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif 640w"><figcaption><span style="white-space: pre-wrap;">What Waymo&apos;s sensors see (</span><a href="https://waymo.com/" rel="noreferrer"><span style="white-space: pre-wrap;">Waymo</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Waymo uses different types of cameras. </strong>They work with high-res long-range cameras (color, telephoto lenses) for object detection far down the road, wide-angle cameras for close-range coverage (pedestrians, cyclists, intersections), and near-infrared cameras for night vision / low-light perception.</p><p><strong>Regarding LiDARs, they&apos;re using their own sensors called &quot;Laser Bear Honeycomb&quot;</strong>. There is one forward LiDAR, 2 side LiDARs, and one at the rear. But these are short range, solid-state LiDARs. They are excellent for blind spots and front facing vehicles, but complex to drive on highways because they don&apos;t see far. This is why there is the roof LiDAR, which is mechanical, and sees several hundred meters away. </p><p>On the animation below, you can see LiDARs both in point clouds format and in range-view &#x2014;&#xA0;and  <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noreferrer">you can learn about types of LiDARs here</a>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="640" height="360" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif 640w"><figcaption><span style="white-space: pre-wrap;">How Waymo&apos;s sensor complement eachother (source: </span><a href="https://portal.thinkautonomous.ai/self-driving-cars" rel="noreferrer"><span style="white-space: pre-wrap;">THE SELF-DRIVING CAR ENGINEER SYSTEM</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Regarding RADARs, Waymo uses their own line of </strong><a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer"><strong>Imaging RADARs</strong></a>. Imaging RADARs are what I call <a href="https://www.thinkautonomous.ai/blog/fmcw-lidars-vs-imaging-radars/" rel="noreferrer">4D RADARs</a>. Unlike normal RADARs who see in 2D and measure the velocity, these ones see in 3D and measure the velocity.</p><p>Do you want to see what all of this looks like together? Okay, here it is:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.39.44-ezgif.com-optimize.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="480" height="272"><figcaption><span style="white-space: pre-wrap;">The orange represents the point clouds and detections &#x2014; the blue represents the Imaging RADAR signatures &#x2014;&#xA0;the cameras are at the bottom row</span></figcaption></figure><p>Waymo uses a powerful array of sensors allowing them to see every possible object. We&apos;ll come back to the utility of LiDARs and RADARs, but for now, let&apos;s look at cameras...</p><h3 id="2-teslas-sensor-design-8-cameras-thats-it">2. Tesla&apos;s Sensor Design: 8 Cameras, that&apos;s it </h3><p>Unlike Waymo vehicles, Tesla&apos;s approach relies only on cameras. Tesla&apos;s autopilot aims to solve autonomous driving using vision-only.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1614" height="750" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1614w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s sensor stack used 8 cameras (red) and 12 ultrasonics (orange)</span></figcaption></figure><p><strong>On this illustration, you can see a very simple design that almost never changed. </strong>In red, you can count 8 outside cameras, and in orange, you can see 12 ultrasonic sensors used to detect static objects when parking. Among the 8 cameras, there 2 on the windshield, used for stereo vision, one on the front bumper, one on the rear bumper, 2 on the doors, and 2 on the wheels. See the difference? I can&apos;t even begin to count Waymo&apos;s cameras, but I can easily show you Tesla&apos;s sensor stack.</p><p><strong>So let&apos;s visualize what the cameras see:</strong></p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1560" height="1110" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 1560w" sizes="(min-width: 720px) 720px"></figure><p>Interesting, or not. Now, let&apos;s try and understand, who has the best sensor suite?</p><h3 id="3-who-has-the-best-sensor-suite-tesla-or-waymo">3. Who has the best sensor suite... Tesla or Waymo?</h3><p><strong>I am going to show you an image</strong>, and I would like you to ONLY look at the left part. Ignore the right for now. Can you tell me what you see? You see a car, don&apos;t you?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="2000" height="639" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Seeing under fog or difficult conditions is the limiting point of a camera only architecture, isn&apos;t it?</span></figcaption></figure><p><strong>But did you notice the pedestrian?</strong> This is one of the limits of the vision-only approach. When you look at Tesla&apos;s miles driven without disengagement reports, Tesla FSD clearly shows a limit on bad weather. They don&apos;t drive well on cloudy foggy, rainy, or snowy scenes, and they don&apos;t drive at all during storm and sleet (when ice falls from the sky).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1048" height="620" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 1048w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla can&apos;t drive autonomously in regions like snow, storm, sleet, fog, and even heavy rains (</span><a href="https://teslafsdtracker.com/Main" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Reports show that FSD13 significantly improved driving by night,</strong> but there is still a <u>physical</u> limitation to driving with cameras only that is not solvable with better algorithms. In a robotaxi situation in which you&apos;d sit in the passenger seat, you would be stuck. The way humans drive involves more than cameras, we hear street sound, we sense people, and we don&apos;t use wide-angle cameras to detect other cars.</p><p><strong>If we were to come back to Elon Musk&apos;s comment now, do you remember the &quot;If LiDARs/RADARs disagree with cameras, which one wins???&quot;</strong>. Waymo shows that redundancy is key to safety. If a camera misses something, your RADAR may not. And with Kalman Filters, you can certainly develop a powerful fusion module to account for disagreements.</p><p><strong>When we consider the physical ability of a LiDAR to generate point clouds, you can understand how powerful having them is</strong>. Other than seeing through night or other situations, they physically build <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noreferrer">point clouds</a>. Recently, a YouTube video has shown a Tesla vs LiDAR-equipped car driving on a wall resembling a street. The Tesla FSD crashed on the wall, confusing it with the highway; but the LiDAR-equipped car stopped.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1872" height="802" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1872w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla vs LiDAR (</span><a href="https://www.youtube.com/watch?v=IQJL3htsDyQ" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>You may wonder... Okay, but we&apos;ll never see fake walls in real life, so what&apos;s the point? The point is that vision only has its limitations. We&apos;ve seen Tesla confuse the moon with objects, miss a red light, do phantom breaks, and even miss truck trailers. <strong>For all of these reasons, I would say that the Waymo approach including Camera+LiDAR+Imaging RADARs is a better choice. They get the #1 point.</strong></p><p>One caveat is that the algorithms, processing power, and energy required to operate this vehicle is insane. LiDAR sensors consume a lot of energy and record a lot of data. Tesla is much cleaner in that perspective.</p><p>Speaking of algorithms, let&apos;s now move to this second point.</p><h2 id="tesla-vs-waymo-who-has-the-best-algorithms">Tesla vs Waymo: Who has the best algorithms?</h2><p>If we now look at the algorithms, who is closer to build self-driving cars? Tesla and Waymo started off with very different architectures, but now seem to converge towards End-To-End Learning. So let&apos;s take a look...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">Hey, I have to make a confession: I couldn&#x2019;t dive deeper into the algorithm side here, but I recorded a full comparison of Tesla and Waymo&#x2019;s architectures. You can get access <a href="https://www.thinkautonomous.ai/blog/de7911199f91436b9edad2fc7bdce9b9?pvs=25" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a>.</div></div><h3 id="1-tesla-fsd-algorithms-hydranets-occupancy-networks-and-end-to-end-learning">1. Tesla FSD Algorithms: HydraNets, Occupancy Networks, and End-To-End Learning</h3><p>In my <a href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/" rel="noreferrer">article breakdown on Tesla</a>, I&apos;m doing a full deep dive on the Tesla&apos;s algorithm, and how they work; so I won&apos;t do that here, but I will still show you the overview of how they built their <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">autonomous vehicle architecture</a>. Note that it&apos;s according to the Tesla data; which moved to private in 2023.</p><p>You can see 3 main blocks:</p><ul><li><strong>Lane &amp; Object HydraNet: </strong>The lane and object <a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noreferrer">Hydranet</a> is a multi-task learning network that takes in the 8 cameras, learns features from each using a CNN, fuses them spatially and temporally via a Vision Transformer, and then outputs several heads. Heads are trained to detect objects, lanes, positions, and so on... You can read more details here.</li><li><strong>Occupancy Network</strong>: The <a href="https://www.thinkautonomous.ai/blog/occupancy-networks/" rel="noreferrer">Occupancy Network</a> is also processing all 8 cameras spatially and temporally, except that it&apos;s trained to leverage spatial data. This is a 3D network that aims to build voxels and assign a free/occupied state to each. You can read more details here.</li><li><strong>Planning &amp; Control</strong>: The Planning &amp; Control node used to be (in the drawing) done via a Monte-Carlo Tree Search. This is traditional artificial intelligence. In 2024, they replaced this with a Neural Network planner. While we don&apos;t have details on how it works, the &quot;End-To-End&quot; comes from this node moving to Deep Learning, making the entire network differentiable.</li></ul><p>To push the explanation even further, let me show you the typical visualizers on a Tesla, and see how they both refer to an algorithm:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1528" height="806" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 1528w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s algorithm visualizations</span></figcaption></figure><p>Now, let&apos;s see Waymo...</p><h3 id="2-waymos-algorithms-3d-deep-learning-diffusion-planners-and-more">2. Waymo&apos;s Algorithms: 3D Deep Learning, Diffusion Planners, and more...</h3><p>It&apos;s a bit harder to fully track the state of Waymo&apos;s algorithms, because they continuously release <a href="https://waymo.com/research/" rel="noopener noreferrer">multiple research papers</a>, and we don&apos;t know which ones are in production, and which are just pure research. Still, according to my research, there are 3 core pillars Waymo relies on to drive...</p><ul><li>LiDARs</li><li>Prediction/Tracking</li><li>Imitation/End-To-End</li></ul><h4 id="lidars">LiDARs</h4><p><strong>Early on, Waymo pioneered work on LiDARs with 3D Object Detection algorithms like SW-Former</strong>. These algorithms process LiDAR point clouds and output bounding boxes in 3D. This architecture has been updated a few times, but now serves as the &quot;core&quot; detection algorithm of Waymo. From there, it&apos;s encapsulated into other pipelines, like the Late-To-Early Fusion:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1724" height="784" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1724w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s </span><a href="https://waymo.com/research/swformer-sparse-window-transformer-for-3d-object-detection-in-point-clouds/" rel="noreferrer"><span style="white-space: pre-wrap;">SW-Former</span></a><span style="white-space: pre-wrap;"> &amp; </span><a href="https://waymo.com/research/lef-late-to-early-temporal-fusion-for-lidar-3d-object-detection/" rel="noreferrer"><span style="white-space: pre-wrap;">Late-To-Early Fusion</span></a><span style="white-space: pre-wrap;"> algorithms</span></figcaption></figure><p><strong>See what&apos;s happening?</strong> We have a temporal fusion algorithm that processes LiDARs and boxes from t-1, t-2, and so on... and each of these go to SWFormer to output a final head. This is turning SWFormer into a temporal detector, and not just a frame-by-frame detector.</p><h4 id="prediction-tracking">Prediction &amp; Tracking</h4><p><strong>Going even further, we have Prediction &amp; Tracking. </strong>Waymo bets big on tracking, and one of the core algorithms I noticed there is an architecture recently released called Stateful Track Transformer which is doing exactly the job of tracking from SWFormer. Over the years, Waymo released TONS of prediction and tracking architectures.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1594" height="644" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 1594w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s Stateful Track Transformer (</span><a href="https://waymo.com/research/stt-stateful-tracking-with-transformers-for-autonomous-driving/" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="end-to-endimitation">End-To-End/Imitation</h4><p>Back in the early 2020s, I remember vividly Waymo mentioning an algorithm called ChauffeurNet, who was behaving exactly like Tesla&apos;s HydraNet, but was outputting trajectories. Since then, the approach evolved, and the later published papers and public talks mention the End-To-End architecture named EMMA, as well as Vision Language Models.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="2000" height="731" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s Encoder/Decoder Architecture (</span><a href="https://io.google/2025/explore/pa-keynote-22" rel="noreferrer"><span style="white-space: pre-wrap;">Google I/O 2025</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>As you can see, this turned in to an Encoder/Decoder architecture,</strong> where the encoder learns features from each sensors, then fuses spatially and temporally, to learn a compressed representation of the scenes. Then, the decoder is a generative part, built on VLMs to predict a trajectory.</p><p><strong>While many claimed Waymo has a &quot;modular&quot; approach,</strong> <strong>while Tesla has the advanced End-To-End approach; this is simply no longer the case.</strong> Although we don&apos;t know whether Waymo uses EMMA in production, or still relies on their traditional pipeline, we definitely know they&apos;re heading towards it.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Are you enjoying this part? I am doing a full coverage of all these algorithms </strong></b>&#x2014;&#xA0;via a 1h Tesla Masterclass &#x2014; and a detailed algorithmic comparison of Tesla &amp; Waymo in my platform.<br><br>It&apos;s reserved to my daily email readers, and if you&apos;d like to join us: <a href="https://www.thinkautonomous.ai/sdc-app" target="_blank" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">You can sign up here for free and get the deep dives</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h3 id="3-who-has-the-better-algorithms-waymo-or-tesla">3. Who has the better algorithms? Waymo or Tesla?</h3><p>Let me write 4 or 5 bullet points explaining what I think:</p><ul><li><strong>Both Tesla and Waymo seem to be headed towards End-To-End Learning, because the Modular approach has the car behave in a &quot;robotic&quot; fashion. </strong>This is not smooth, feels robotic, and rule based. So both go towards End-To-End...</li><li><strong>But to make End-To-End work well, you need LOTS of data</strong>. Tesla has that from their fleet of millions of cars all across the world, but Waymo has a small fleet driving only in very specific regions. Scaling via End-To-End will be extremely painful for them.</li><li><strong>On top of that, Tesla only needs to process camera data</strong>, which makes the algorithms likely faster, using less power and less time consuming.</li><li><strong>End-To-End&apos;s biggest problem is edge cases</strong>. Tesla has experience driving several million miles in construction zones, parking lots, or in a new city, and so on... Waymo, on the other hand, is stuck with HD Maps and can have issues moving towards End-To-End...</li><li><strong>Tesla also has advanced techniques for Edge Case Detection and retraining,</strong> such as Trigger Classifiers (<a href="https://www.thinkautonomous.ai/blog/automotive-data-processing/" rel="noreferrer">see my detailed overview here</a>), as well as Dojo and Self-Supervised Learning. They seem far more prepared for End-To-End Learning in my perspective than Waymo. It&apos;s as if Tesla had paved the way for this for a decade, while Waymo pivoted last minute</li></ul><p>Given these, and under the assumption that the price here is End-To-End, I would give my points to Tesla.</p><h2 id="waymo-vs-tesla-who-has-the-better-map-to-level-5">Waymo vs Tesla: Who has the better Map to Level 5?</h2><p>In this last point, I would like to get off the sensors and technique, and see these as businesses primarily. Their goal is to sell self-driving cars, or autonomous rides, and thus... who&apos;s leading in that sense?</p><h3 id="1-strategy">1. Strategy</h3><p>First of all, something very important to understand:</p><ul><li>Tesla sells self-driving cars</li><li>Waymo rents autonomous transportation services</li></ul><p><strong>This is essential, because this means their software, sensor stack</strong>, philosophies, business models, and even strategies to reach Level 5 are totally opposed. This is very clear when you see the graph below, which shows Waymo starting with a very capable vehicle, but only in ONE geo-fenced area, with ONE car, while Tesla starts with millions of car, but none of them are autonomous. Both don&apos;t scale the same thing:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1392" height="870" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 1392w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The strategy of Tesla and Waymo are not the same</span></figcaption></figure><p><strong>Tesla&apos;s goal is to make millions of cars at a 25,000$ price point</strong>. Imagine Tesla in 2016, being told to integrate LiDAR technology, which cost over 50k USD/unit. Wouldn&apos;t they be better off starting with a light camera + RADAR Level 2, and gradually improving it? Of course they would, if they believe it&apos;s possible.</p><p><strong>On the other hand, Waymo had a fleet of maybe 20 vehicles at the time.</strong> With 29 cameras, 5 RADARs, and 6 LiDARs, a Waymo car costs significantly more than a Tesla car. In fact, each car was estimated to cost around 250,000$ a few years ago. Since then, the LiDAR price dropped, Waymo grew its fleet to over 2,000 cars, and each is now estimated around 150,000$. Can you see how they cost is less and less of a problem to them over time?</p><p><strong>Waymo&apos;s LiDARs certainly cost a lot, but this cost get absorbed as they do more paid rides</strong>, <strong>until it &apos;supposedly&apos; becomes profitable.</strong> Supposedly, because there is maintenance, replacement, and growth of the fleet, which can make the cost a never ending fight. Even there, Waymo has more leverage to afford to LiDARs, by raising the ride cost of millions of people from 7$ to maybe 13$ (made up numbers, terribly off &#x2014;&#xA0;just making a point).</p><p><strong>There are great statistics on </strong><a href="https://www.01core.com/p/driverless-car-costs-have-gotten" rel="noopener noreferrer"><strong>this blog post</strong></a><strong> from Ben Buchanan</strong> that show how Waymo&apos;s car get more and more affordable over time. A LiDAR costs 500-1,000$ today, this completely challenges the vision-only philosophy. Unlike Tesla, time in on Waymo&apos;s side.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1124" height="676" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 1124w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo has time on their end. The more rides they make, the more they get their investment back</span></figcaption></figure><p><strong>You can therefore understand how each one wins</strong>:</p><ul><li><strong>Tesla starts with TONS of disengagements</strong> and tries to decreases that number to 0. <u>The more capable the algorithm, the more cars the will sell.</u></li><li><strong>Waymo starts with very few disengagements </strong>and tries to scale the number of rides and regions without increasing this number. <u>The more regions they will cover, the more rides they will sell.</u></li></ul><h3 id="2-hd-maps"><strong>2. HD Maps</strong></h3><p><strong>Waymo is betting on a serious &quot;HD Map&quot; strategy that Tesla refused to adopt</strong>. According to Tesla, the car should be able to drive anywhere in the US, so there&apos;s only using normal Google Maps or OpenStreetMap. It does not mean they don&apos;t use HD Maps; they do (see screenshot below), but they don&apos;t <u>require</u> them to drive. If the car ends on a parking lot with no map, it should still be able to drive.</p><p><strong>On the other hand, Waymo maps every squared inch of every place they drive in. </strong>This means every traffic sign, every bumper, every crossroad, every traffic light, every roadwork, lane lines, speed limit... <u>everything</u> is continuously mapped and updated. You can see on the image the HD Maps of Waymo and a screenshot of Tesla&apos;s HD Maps on a car in debug.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1434" height="620" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 1434w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla vs Waymo&apos;s HD Map Game</span></figcaption></figure><p>From what I read, Tesla performs poorly on regions where they don&apos;t know the maps. While this doesn&apos;t block them, this certainly causes disengagements. They are therefore equal in this sense.</p><h3 id="3-miles-driven-disengagements">3. Miles Driven &amp; Disengagements</h3><p><strong>We can&apos;t conclude this article without first looking at who slams the brakes the most</strong>. Yet, I feel you already know the answer, from the first two points. It&apos;s obviously Tesla, because they have many more cars, miles driven (5B for Tesla vs 100M for Waymo), and different situations. So how can we really vote who has the better map to Level 5?</p><p><strong>Let&apos;s first look at disengagements. </strong>What is a disengagement? Is overtaking a stuck vehicle a disengagement? Is accelerating? What is the definition, and do Tesla and Waymo use the same? Well, Waymo uses safety drivers, who obey specific instructions. Tesla drivers disengage for virtually any reason, even if they simply feel like it. This is why <a href="https://teslafsdtracker.com/Main" rel="noopener noreferrer">Tesla FSD Tracker</a> shows, as of September 25, <u>24% of FSD drives have a disengagement, and 3% have critical disengagements (US).</u></p><p><strong>Continuing with more stats:</strong> Tesla drives on average 213 miles before a disengagement on a highway, and most disengagements are caused by Lane Issues. Notice the jump in miles driven without a disengagement with FSD <strong>12.6</strong>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1548" height="676" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 1548w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Disengagement Reports of Tesla (</span><a href="https://teslafsdtracker.com/Main" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>For Waymo, it&apos;s a different story</strong>. Waymo has a driver permit in California, which had them recently release the <a href="https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/disengagement-reports/" rel="noopener noreferrer">2024 disengagement report</a> to California DMV. It clearly showed a number of 9,793 miles driven before disengagement. You may ask... How is Tesla at 200 but Waymo at 9,700 miles driven without disengagement? It&apos;s because, as we said, the definition, reason for disengagement, and cities, are extremely different.</p><p>If a company wants to drive on a straight line for 500 million miles and show 0 disengagement, they technically can. This is why I think, we can&apos;t really trust any of these numbers &#x2014;&#xA0;only their relative evolution on the same conditions as before &#x1F937;&#x1F3FB;&#x200D;&#x2642;&#xFE0F;</p><p><strong>So who has the better Map to Level 5?</strong> I will tell you, but only after we have made a quick summary of what we&apos;ve seen...</p><h2 id="summary-who-is-ahead-waymo-or-tesla">Summary: Who is ahead, Waymo or Tesla?</h2><ul><li><strong>Tesla and Waymo both have different sensor stacks</strong>. While Waymo relies on a stack of 29 cameras, 6 radars, and 5 LiDARs, Tesla takes a different route and relies on 8 cameras only.</li><li><strong>A sensor setup with LiDARs and RADAR redundancy is safer and gives more reliability than a vision-only setup. </strong>LiDARs can detect objects and brake the car even if no object is identified by the camera or an algorithm. Waymo also drives in more weather conditions than Tesla, who is physically limited by the lack of other sensors.</li><li><strong>Waymo uses an architecture based on LiDARs,</strong> with algorithms like SW-Former, Prediction/Tracking, and EMMA as an End-To-End system.</li><li><strong>Tesla uses an End-To-End approach</strong> involving HydraNets, Occupancy Networks, and Deep Planning. This approach is reinforced by powerful Trigger classifiers, Self-Supervised Learning Dojo, a powerful data fleet &#x2014;&#xA0;making them win on the algorithm aspect.</li><li><strong>Waymo depends heavily on detailed HD Maps and can only work on these mapped, geo-fenced areas</strong>. Tesla&#x2019;s system is designed to drive anywhere without necessarily relying on HD maps. In practice, they drive much better when they have maps.</li><li><strong>Waymo boasts a much better disengagement rate, </strong>with safety drivers rarely needing to take control compared to Tesla&#x2019;s more frequent interventions; but the conditions they drive in and disengage are 100% under <u>their</u> control.</li><li><strong>The two companies have very different business models:</strong> Waymo sells autonomous ride services, Tesla sells cars with self-driving features. As a result, their map to level 5 is not the same.</li></ul><p>And now, you&apos;re all caught up. So, the Map to Level 5?</p><p></p><h3 id="the-map-to-level-5">The Map to Level 5</h3><p>Here is what I think:</p><p><strong>To reach Level 5, Tesla will need to <u>reduce</u> the number of disengagements. </strong>To me, they will have to include LiDARs or RADARs at some point. At today&apos;s cost, that would probably be feasible. In fact, I&apos;m pretty sure Tesla is so competent FSD would be solved right now if they didn&apos;t chose to play the game on hard more. But now, after all this time, can they really afford to do this? There&apos;s ego, brand image, and the &quot;FSD capable&quot; computers they already sold. How can they? Tesla is in the camera game for good.<br></p><p><strong>To reach Level 5, Waymo has to <u>increase</u> the number of areas they drive <u>without increasing</u> the disengagement rate</strong>. This means adapting the &quot;HD Map&quot; strategy, which still relies on them, and making algorithms capable to adapt to a new region faster. To make autonomous driving a reality for the entire world, Waymo will need to go faster. Their autonomous vehicles surely are capable, but their current scaling strategy takes too long.</p><h2 id="next-steps">Next Steps</h2><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">I hope you enjoyed this article and it taught you a lot! </strong></b>If you want to go to the next steps, I have a platform for my daily email readers which contains 60 min+ of videos explaining Tesla&apos;s algorithms, and comparing them to Waymo&apos;s architecture. This is a more technical deep dive, and I&apos;m sure you&apos;ll love it. Interested? <a href="https://www.thinkautonomous.ai/sdc-app" target="_blank" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">You can sign up here for free and get the deep dives</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div>]]></content:encoded></item><item><title><![CDATA[How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)]]></title><description><![CDATA[Self-driving cars collect Tb of videos every day... but is that really needed? (spoiler: No) 

In this article, you'll discover how to collect data in the AV 2.0 age; from Tesla's Trigger Classifiers, to Heex Event Management solutions, learn the different ways to do automotive data processing.]]></description><link>https://www.thinkautonomous.ai/blog/automotive-data-processing/</link><guid isPermaLink="false">685277c9c8f3bf93bd18732c</guid><category><![CDATA[self-driving cars]]></category><category><![CDATA[deep learning]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 29 Jul 2025 08:16:16 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/07/automotive-data-processing.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/07/automotive-data-processing.jpg" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)"><p><strong>Have you ever heard the story of the iPod?</strong> It started in January 2001, right after Apple announced a loss of $195 million, and had missed the shift to digital music. The company was lost, and had one last chance of survival: building an MP3 player to catchup with the competitors.</p><p><strong>If you are old enough to remember the MP3 players back then,</strong> the were&#xA0;confusing to use, overloaded with buttons and menus, and made the experience painful for customers. Apple was looking for a solution for months, but had no clue how to make it better.<br><br><strong>Until one day, when Apple&apos;s Head of Marketing Phil Schiller suggested using a scroll wheel</strong>. Wheels already existed in mouses and dial phones, but had never been never used in music players. With this, he suggested that the menus should scroll faster the longer the wheel is turned,&#xA0;a stroke of genius&#xA0;that would distinguish the iPod from the agony of using competing player.<br><br>The rest is history: Apple developed the iPod in the greatest secrecy, launched it, and changed the world with &quot;1000 songs in your pocket&quot;.<br><br><strong>What made it so successful?</strong>&#xA0;It&apos;s not that it looked good, or had buttons, or could store more songs. No, the genius was in the <strong><u>smarter</u></strong>&#xA0;experience scroll wheel.</p><p><strong>If you&apos;re in the autonomous vehicles market, we probably witnessed a similar pattern:</strong> companies have been collecting more and more data endlessly, building data centers, simulators, hiring people to analyze the data generated, and so on... Until some companies came up with smarter ways, not involving just &quot;collecting more data&quot;, but rethinking the experience to focus on events instead.</p><p>In this article, I would like to tell you about the way automotive data processing works nowadays, and how the AI revolution is going to reshape it.</p><p>We are going to learn about 3 ideas:</p><ol><li><strong>The first part is going to focus on the Manual Era</strong> (where we collect and process it all) and the <strong>Cloud</strong> <strong>Era</strong> (where we use DataLakes)</li><li><strong>The second part will be a case-study provided by an autonomous tech startup, </strong>revealing the 10 biggest problems of the Cloud Era.</li><li><strong>The last part will show you the Edge Intelligence Era &amp; the Autonomous Era</strong>, which, as you&apos;ll see, if an incredibly more intelligent way to do</li></ol><p>Let&apos;s begin with point #1.</p><h2 id="1-data-management-how-self-driving-car-companies-collect-and-process-data-in-the-cloud-era">1. Data Management: How self-driving car companies collect and process data in the Cloud Era</h2><p><strong>One of the things we heard the most this past decade was that Data is king. </strong>And for a long time, collecting as much data as you can in order to train heavy machine learning models has been the only way to do. Let&apos;s talk about data collection, and then processing.</p><h3 id="how-do-autonomous-vehicles-collect-data">How do autonomous vehicles collect data?</h3><p>We know that when a self-driving car drives, all the data (sensors, images, messages, hardware status, algorithm decision, ...) is being recorded.</p><p>Should we give an intro line explaining how?</p><p><strong>The process is simple, and looks like this:</strong></p><ol><li>You <strong>plug</strong> <strong>your</strong> <strong>sensors</strong> to your system (for example, Robotic OS/ROS)</li><li>You<strong> </strong>press<strong> record</strong></li></ol><p><strong>I&apos;m sure somebody out there worked hard to find a more complex process, </strong>but if you&apos;re using a tool like ROS, recording data is as simple as using one command line. In the video below, you can see me recording LiDAR point clouds, camera images, GPS positions, algorithms outputs, and mostly all the messages passed through the self-driving car while we drive...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/road_data-ezgif.com-optimize.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="480" height="260"><figcaption><span style="white-space: pre-wrap;">Visualizing the live sensor streams of a self-driving car</span></figcaption></figure><p>When I&apos;m done recording, the output is a file in the .<strong><em>bag</em></strong> extension (for ROS 1) that can vary from a few Gb to Terrabytes of data. Let me show you an example below from the <a href="https://github.com/TIERS/tiers-lidars-dataset" rel="noreferrer">TIERS</a><a href="https://github.com/TIERS/tiers-lidars-dataset" rel="noreferrer"> dataset</a>. Notice the duration and sizes of the recordings below &#x2014; the last one is just <strong>8 minutes long</strong>, and yet weights <strong>200Gb</strong>. This is 2.4Gb/minute!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="746" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The size of a single ROS Bag is huge.</span></figcaption></figure><p>The bag is the first element. Then comes what we do with it.</p><h3 id="the-manual-era-how-do-we-process-and-analyze-data">The Manual Era: How do we process and analyze data?</h3><p><strong>The first &quot;era&quot; I&apos;d like to tell you about is the 1.0 era</strong>. Back when I worked on autonomous shuttles, each of our fully autonomous vehicles was driving and collecting data to SSD drives. When the day was over, we had hundreds of Gb to process. So we started coming up with file naming conventions, involving the date, event, and so on...</p><p><strong>Then, when back to the office, we could replay our algorithms on it,</strong> train our models on the data, and so on... Below is an example of a <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer">drivable area segmentation</a> algorithm I&apos;ve been training on the data collected:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/ezgif.com-optimize--11-.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="480" height="270"><figcaption><span style="white-space: pre-wrap;">Example of a &quot;Replay&quot; of a Drivable Area Segmentation Algorithm</span></figcaption></figure><p>This was fine for a small startup of 8 people, and it&apos;s probably still okay for small companies that don&apos;t need extensive processing, but most autonomous vehicle companies have turned to the cloud...</p><h3 id="the-cloud-era-how-advanced-driver-assistance-systems-adas-most-of-the-automotive-industry-is-using-data-lakes">The Cloud Era: How Advanced Driver Assistance Systems (ADAS) &amp; most of the Automotive Industry is using Data Lakes</h3><p><strong>If you record data every day, and each recording is hours long, you&apos;re never going to find the events you need</strong>. This is why I&apos;m showing you a more sophisticated, let&apos;s say &apos;1.5&apos; version, which makes data collection part of a pipeline.</p><p>It looks like this:</p><ol><li>You <strong>record</strong> the data</li><li>You <strong>upload</strong> it to AWS/Azure</li><li>The R&amp;D team then <strong>processes</strong> it weeks later, <strong>replaying</strong> all the events, <strong>searching</strong> for 10% possibly interesting scenarios, or events, and so on...</li></ol><p>If you&apos;d like to see real-world concepts, you can see AWS and Azure Data Lakes:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1870" height="628" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1870w" sizes="(min-width: 1200px) 1200px"><figcaption><a href="https://www.thinkautonomous.ai/blog/medical-image-segmentation/" rel="noreferrer"><span style="white-space: pre-wrap;">AWS Data Lake</span></a><span style="white-space: pre-wrap;"> vs </span><a href="https://learn.microsoft.com/en-us/industry/mobility/architecture/avops-architecture-content" rel="noreferrer"><span style="white-space: pre-wrap;">Azure Data Lakes</span></a></figcaption></figure><p>A lot of companies in the self-driving car market use these &quot;data lakes&quot;. Let&apos;s look at the Azure Data Lake in a simplified view:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/data-lake.001.jpeg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1280" height="720" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/data-lake.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/data-lake.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/data-lake.001.jpeg 1280w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The 4 Horsemen of Data Processing: The Bag Recording is just a tiny step</span></figcaption></figure><p>ADAS &amp; fully autonomous cars use it. After recording, the 4 key blocks are:</p><ol><li><strong>DataOps</strong>: Where we analyze data, clean it, label it, augment it, tag it, and so on... Notice the interaction with external labellers; that idea is called &quot;human-in-the-loop&quot;.</li><li><strong>MLOps</strong>: The machine learning algorithms, training, testing, and so on...</li><li><strong>ValidationOps</strong>: The validation part, involving visualization, scenario, and simulation.</li><li><strong>MetaData</strong>: After the DataOps tagged the data, we can search for it.</li></ol><p>You can see how it&apos;s placing data at an element in the chain.</p><p>So what are the problems of this? Before telling you about the 2.0 Autonomous Era, let&apos;s try to see a case study with real ADAS or artificial intelligence companies using it...</p><h2 id="2-case-study-adas-actors-reveals-their-10-biggest-problems-with-data-driven-approaches">2. [Case Study] ADAS Actors reveals their 10 biggest problems with Data-Driven Approaches</h2><p>In this section, before talking about the &apos;2.0&apos; approach, I would like to tell you about the core problems companies who process large volumes of data reported.</p><p><strong>Before writing this article, I got the opportunity to talk to </strong><a href="https://www.heex.io/en-gb/smarter-data-faster-decisions" rel="noreferrer"><strong>Heex Technologies</strong></a>, a french startup specialized in Event Based Data Management... and I asked them &quot;Which problem do you solve?&quot;. To answer, they shared a 20 page PDF listing all the problems their biggest Advanced Driver Assistance Systems (ADAS), autonomous driving, or robotic clients from the automotive industry.</p><p>In the PDF, I spotted a lot of interesting problems. Let me share the main ones with you:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/heex-data-processing.001.jpeg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/heex-data-processing.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/heex-data-processing.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/heex-data-processing.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/heex-data-processing.001.jpeg 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Case Study reveals tons of time and money issues in classical data processing</span></figcaption></figure><p><strong>If I were to list down the 10 main problems, you&apos;d see:</strong> <u>Slow</u> access to critical events, <u>Manual</u> data processing, <u>Exploding</u> Cloud costs, <u>Fragmented</u> data, <u>Delayed</u> visualization (no real-time), <u>Manual</u> Extraction of scenarios, <u>Useless</u> streaming data, <u>Physical</u> SSD Extraction, <u>Blind</u> Debugging, and <u>Inefficient</u> ROS Bag Processing.</p><p>Notice all these terms I underlined? These are the problems of today&apos;s data management systems.</p><p>Let&apos;s take some examples...</p><ul><li><strong>If you collect the data on Day 1, and process it on Day 3, </strong>you have slow/delayed access to critical events; like a missed pedestrian. So you&apos;re driving, notice something wrong, but you have to wait until 2 days later to even look for the data, and start searching for that event you noticed...</li><li><strong>Similarly, can you see how the &apos;fragmented&apos; data processing is a problem? </strong>Especially when you are with a team. Engineer A grabs bag A, and makes decisions based on it... Engineer B grabs bag B and makes a different decision based on it... The entire decision cycle happens in <u>silos</u>.</li><li><strong>The Physical SSD extraction is a problem too.</strong> In May 2025, I was at the Stuttgart ADAS &amp; AV Expo, and I met a company who invented a &quot;swap&quot; disk system... All of this is great, but that&apos;s still the same problem of storing, copy/pasting data, etc... to a system.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/IMG_3688-ezgif.com-optimize.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">How to &quot;swap&quot; SSD hard drives in self-driving cars (B-PLUS Demo)</span></figcaption></figure><p><strong>In each of these problems, I noticed a <u>time</u> and <u>money</u> waste</strong>.</p><p>For example, the client reporting: &quot;<em>Engineers waited several days to weeks to access specific events due to the <strong><u>time-intensive</u></strong> process of uploading, filtering, and classifying raw data in the cloud.</em>&quot; is clearly facing a <strong><u>time</u></strong> problem, reviewing large amount of data... The other client who mentioned: &quot;<em>The full data pipeline we built &#x2014;from data capture to processing and storage&#x2014;incurred <strong><u>high cloud costs</u></strong> and consumed engineering resources</em>&quot; faces a <strong><u>money</u></strong> problem...</p><p>In this same report shared by Heex, all the companies reported improvement in their pipeline. Whether it was better decision making, more time freed, or money saved. This is why the next part is so important, so let&apos;s now focus on it: Event Driven Data Processing for autonomous cars.</p><h2 id="3-event-driven-data-management-for-autonomous-cars">3. Event Driven Data Management for autonomous cars</h2><p><strong>Back when I started learning autonomous driving algorithms</strong>, I listened to an interview from Sebastian Thrun, acknowledged as te godfather of self-driving cars, who at some point, said something that marked me: [paraphrased]: &quot;<em>With a team of 2/3, you can build a self-driving car that drives 90% of scenarios in a weekend. Then to get to 95%, it takes a few weeks, and to complete these last 5%, it takes years.</em>&quot;</p><p>This idea is called the &quot;long-tail&quot; problem.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="736" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">If you drive 10 minutes, you&apos;ll see 90% of the events. In order to find edge cases, you must drive and record hours and hours of data</span></figcaption></figure><p><strong>When looking at traffic accidents involving autonomous vehicles</strong>, you often see rare events or edge cases at the root cause. The person wearing a stop sign t-shirt, the truck with a donkey on the trailer, the traffic sign burned by parisian riots, all of these unusual scenes totally different from empty highways cars are used to.</p><p>Some companies solve it with data generation, others with simulation, or with End-To-End Learning. Yet, the root of all evils here is data, and thus, this is what we have to change.</p><p><strong>A decade ago, the term &quot;data&quot; became king, and everybody became a Data Scientist</strong>, Data Engineer, Data Ops, Data Something. It was the case until recently when the data revolution passed, and breakthrough innovations happened not thanks to more data, but thanks to smarter training systems (like self-supervised learning), or more powerful architectures (like transformers). &quot;More data&quot; was ultimately not the solution, and thus, we have to switch our thinking...</p><h3 id="the-edge-intelligence-era-from-data-management-to-event-management">The Edge Intelligence Era: From Data Management to Event Management</h3><p><strong>After companies have recorded a few laps of the neighborhood they drive in</strong>, recording more of this same scene doesn&apos;t make sense. Companies record more and more, just to spot the 1% of long-tail events. What if we worked on these events only, from the beginning?</p><p>It can be done, by setting up a &quot;triggers&quot; in your system, that will act as a filter and only capture the scene when interesting events happen, such as:</p><ul><li><strong>Objects Missed</strong>: If one camera misses an object that another sensor sees</li><li><strong>Near Pedestrian Collision</strong>: If pedestrians are within 2 meters of our car, and we drive over 30km/h</li><li><strong>Human Intervention</strong>: If a human driver manually took over</li><li><strong>Shakes</strong>: If the camera physically moved due to a bumper or small shock</li><li><strong>Ego Collision</strong>: If a collision with the ego vehicle happened</li><li>and so on...</li></ul><p>All of these are valid events we&apos;d like to record. The rest? When it&apos;s all smooth? Well, we already have millions of it.</p><p>An example with Heex Technologies, and their platform allowing to set triggers:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://mintlify.s3.us-west-1.amazonaws.com/heextechnologies/public/img/welcome-to-heex-smart-data-platform/triggers.png" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="3204" height="1808"><figcaption><span style="white-space: pre-wrap;">The Heex Technology platform, allowing you to set &quot;triggers&quot;, such as collision, hard brake, and so on...</span></figcaption></figure><p>If I were to show you the 2.0 process, it&apos;d look like this:</p><ul><li>You have the <strong>same</strong> <strong>car</strong> with LiDARs generating the same 10Gb/h data</li><li>Rather than recording all data available, you <strong>define</strong> <strong>triggers</strong>.</li><li>You <strong>intelligently</strong> record the events, like the near pedestrian collision, and not all the data</li><li>You get <strong>instant notifications</strong>, <strong>labels</strong>, and can do real-time decision making</li></ul><p>Seems smarter, isn&apos;t it?</p><p>Now that you have this in mind, I&apos;d like to show you the last era...</p><h3 id="the-autonomous-era-ai-does-it-for-you">The Autonomous Era: AI does it for you</h3><p><strong>The next step is to create algorithms to do it automatically for us.</strong> For example, Tesla patented<a href="https://xilhylujaogys6v6dwfuqa5wrtivfkprrhmf6w7eh7zo46hdgvmq.arweave.net/uhZ8LokDjYl6vh2LSAO2jNFSqfGJ2F9b5D_y7njjNVk" rel="noopener noreferrer"><strong> a concept called trigger classifiers</strong></a><strong>. </strong>The idea is to train their<strong> </strong><a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noopener noreferrer"><strong>HydraNet</strong></a> backbone to classify whether the general scene it&apos;s learning from contains unusual events or not. If it does, let&apos;s say above a certain confidence score, then the machine learning models will trigger a warning.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1698" height="1232" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1698w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s Trigger Classifiers: The Backbone (which focuses on the general scene) is outputting classification on the nature of the scene it&apos;s looking at</span></figcaption></figure><p><strong>Whether you&apos;re working on spreadsheets or building autonomous vehicle technology, automating manual items like labelling or searching for data makes sense. </strong>In this case, just like the Edge Intelligence Era, you can see the events being captured live while driving, and not after.</p><h4 id="the-20-vision">The 2.0 Vision</h4><p><strong>This is going with the &quot;2.0&quot; vision of self-driving cars that companies now define</strong>. A vision driven by Deep Learning first, where data matters, but where more data isn&apos;t the solution. In the 2.0 vision, quality is better then quantity; contextual intelligence is needed, learning should be real-time, and training should be done on relevant data.</p><p>If the 1.0 vision involved heavy test vehicles, modular architectures; the 2.0 vision is about AI &amp; efficiency.</p><p>Now, let&apos;s see an example of a company specialized in this...</p><h2 id="example-how-heex-technologies-turns-data-into-event-management">Example: How Heex Technologies turns Data into Event Management</h2><figure class="kg-card kg-image-card"><img src="https://heex.cdn.prismic.io/heex/65cd1d149be9a5b998b5d409_heex-light.svg?rect=0%2C0%2C100%2C36&amp;w=256&amp;fit=max" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="100" height="36"></figure><p><strong>One of the companies that captured this vision the best is </strong><a href="https://www.heex.io/en-gb/smarter-data-faster-decisions" rel="noreferrer"><strong>Heex Technologies</strong></a>. They built a SaaS platform that implements exactly these ideas of &quot;triggers&quot; &#x2014;&#xA0;and their motto is that rater than focusing on the data, they focus on events. As I already showed you the triggers, we could see it in action:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/analytics--1-.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="834" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/analytics--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/analytics--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/analytics--1-.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Heex&apos;s Visualization Platform shows you the critical events happening, where they happened, and give you full power to solve the long tail problem</span></figcaption></figure><p>Let&apos;s look at their pipeline, which you&apos;ll notice also works backwards &#x2014; once the bag is generated:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1312" height="739" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 1312w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">In this example, we take an existing &quot;dumb&quot; ROS Bag and turn it into a &quot;smart&quot; bag</span></figcaption></figure><p><strong>From a heavy bag, we get a smart bag</strong>. The data is definitely smarter when automatically annotated, categorized, and when relevant events are flagged. We can then re-inject this data into the training pipeline, without having to worry about the rest of the dataset.</p><p><strong>As an entrepreneur myself, I can only admire the focus on one specific and painful problem like this one</strong>. When you can anticipate customer needs, and enable automakers and automotive engineers move away from a complex process to focus on their core job (<a href="https://courses.thinkautonomous.ai/self-driving-cars" rel="noopener noreferrer">self driving technology</a>)... you win!</p><p>Alright, let&apos;s do a summary and see what to do next:</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Collecting data is essential to train AI models and develop self-driving vehicles</strong>. Yet, &quot;more data&quot; is not the solution to create breakthrough, and solve the &quot;long tail&quot; problem, which cause a significant challenge.</li><li><strong>The first Era of data processing is the Manual Era,</strong> in which we record and process everything manually. (1.0)</li><li><strong>The second era (1.5) is the cloud version</strong>, in which you work with data lakes and build a real &quot;chain&quot; that contains DataOps, MLOps, ValidationOps, and so on...</li><li><strong>The third era moves to the 2.0.</strong> It&apos;s where we stop obsessing on the data, and focus on events. We can use triggers and platforms like Heex Technologies to do it.</li><li><strong>The fourth era is the AI Era</strong>. (2+) This is where we have AI automatically find events, and train itself continuously on these.</li></ul><p>Which solution is right for you? In reality, they can all work. A small startup can work manually, until they find their hard problem to solve, and they have a budget to invest in data lakes... Companies can work with data lakes, but for bigger fleets, it&apos;d make much more sense to think in terms of events instead.</p><h3 id="next-steps">Next Steps</h3><p><strong>&#xA0;If you realise you have these problems of recording everything</strong>, having your data staying a bit &#xAB;&#xA0;dumb&#xA0;&#xBB;, and would like to know exactly how<u> to stop recording everything</u> by this afternoon (without losing the important information)... </p><p><strong>... Then I&#x2019;d recommend to check out Heex free discovery quiz</strong>, which will reveal tell you exactly what you&#x2019;re doing wrong today, and (based on your answers) show you what to do this afternoon to save hours of recording, data processing, etc...</p><p>It&#x2019;s free, and you can get access below:</p><div class="kg-card kg-product-card">
            <div class="kg-product-card-container">
                <img src="https://www.thinkautonomous.ai/blog/content/images/2025/07/Screenshot-2025-07-29-at-09.59.36.jpg" width="2720" height="632" class="kg-product-card-image" loading="lazy" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)">
                <div class="kg-product-card-title-container">
                    <h4 class="kg-product-card-title"><span style="white-space: pre-wrap;">Heex Free Discovery Quiz</span></h4>
                </div>
                

                <div class="kg-product-card-description"><p><span style="white-space: pre-wrap;">Is your data strategy a silent obstacle?</span></p></div>
                
                    <a href="https://forms.gle/LimFvrHtb5zjqJUv7" class="kg-product-card-button kg-product-card-btn-accent" target="_blank" rel="noopener noreferrer"><span>Take the Quiz</span></a>
                
            </div>
        </div><p>You can also take a look at Heex&apos; product here: <a href="https://www.heex.io/en-gb/smarter-data-faster-decisions">https://www.heex.io/en-gb/smarter-data-faster-decisions</a></p>]]></content:encoded></item><item><title><![CDATA[Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones]]></title><description><![CDATA[Discover an exclusive excerpt from my Interview with Shield AI, a US Based company in the autonomous defense industry. You'll learn about infiltrationb drones, visual slam, ViDARs, and V-BAT VTOL systems.]]></description><link>https://www.thinkautonomous.ai/blog/shield-ai/</link><guid isPermaLink="false">690b2bf9bad329532556f25e</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 23 Jul 2025 22:00:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/11/shield-ai.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/shield-ai.jpg" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"><p><strong>How much impact do you believe your job has?</strong> Is your job saving someone time? Or money? Or... his life? Well, what if you worked on projects that saved people&apos;s lives? Such companies exist, in self-driving cars, in healthcare, and in the case of this article... in <strong>Autonomous Defense!</strong></p><p>This summer, I have interviewed Vibhav Ganesh from&#xA0;<a href="https://www.shield.ai/" rel="noreferrer"><strong>Shield</strong> <strong>AI</strong></a>, a U.S.-based defense technology company that develops autonomous systems for military and government use. </p><p>I would write a big paragraph here, but let me instead show you a quick sample from the interview...</p><blockquote><strong>Vibhav Ganesh is the&#xA0;Director of Engineering</strong>, past Chief of Staff to the CTO, and Employee #20 of Shield AI.<br><br><strong>Vibhav has played a pivotal role in the company&apos;s growth and innovation. </strong>With a background in visual inertial odometry and SLAM, he has been at the forefront of developing autonomous systems like the Nova 2 quadcopter.</blockquote><p>Let&apos;s read his intro to Shield AI and to their core products: the V-BAT and the ViDAR.</p>
<!--kg-card-begin: html-->
<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:ugcPost:7353401032464293888?collapsed=1" height="550" width="504" frameborder="0" allowfullscreen title="Embedded post"></iframe>
<!--kg-card-end: html-->
<p>There are a lot of things to note about their products: <strong>the V-BAT can last 10 hours,</strong> which is a technological achievement itself, thanks to V-TOL (vertical takeoff and landing)... <strong>ViDAR</strong> is also a very interesting product, which stands for Visual Detection And Ranging... and HiveMind (not shown here) is their AI, or as they call it, &quot;The World&apos;s Best AI Pilot&quot;.</p><p>Let me take you to the v-BAT first, as it&apos;s the core product, by showing you this LinkedIn post we did together, where Vibhav Ganesh introduces us to Shield AI.</p><p><strong>Together, we recorded an exclusive Fragment of&#xA0;</strong><a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a><strong>, </strong>my community membership experience, in which he takes us through Shield AI.&#xA0;What is autonomous defense? What are the main technologies involved? What is the range of products?</p><p><strong>In this post, I&apos;d like to give you a small sample of that interview</strong>, highlighting a very interesting moment where Vibhav talked about infiltration drones.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Before we begin, do you like field interviews?</strong></b> I am bringing new guests to my membership every single month, and when you join my daily emails, you can not only be aware of when these interviews get released, you can also get the opportunity to access the complete training we build for them inside our membership.<br><br>If you&apos;d like to get started, <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">you can receive the emails here</a>.</div></div><hr><h2 id="inside-shield-ais-tactical-infiltration-drones">Inside Shield AI&apos;s Tactical Infiltration Drones</h2>
<!--kg-card-begin: html-->
<iframe src="https://player.vimeo.com/video/1133789552?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="1920" height="1080" frameborder="0" allow="autoplay; fullscreen; picture-in-picture; clipboard-write; encrypted-media; web-share" referrerpolicy="strict-origin-when-cross-origin" title="Shield AI Tactical Infiltration Drones"></iframe>
<!--kg-card-end: html-->
<div class="kg-card kg-toggle-card" data-kg-toggle-state="close">
            <div class="kg-toggle-heading">
                <h4 class="kg-toggle-heading-text"><span style="white-space: pre-wrap;">Read the transcript</span></h4>
                <button class="kg-toggle-card-icon" aria-label="Expand toggle to read content">
                    <svg id="Regular" xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                        <path class="cls-1" d="M23.25,7.311,12.53,18.03a.749.749,0,0,1-1.06,0L.75,7.311"/>
                    </svg>
                </button>
            </div>
            <div class="kg-toggle-content"><p><b><strong style="white-space: pre-wrap;">JEREMY</strong></b><span style="white-space: pre-wrap;">: Okay Vibhav. I&apos;d like to start with the quadcopter. What you call Nova 2. Can you give us an overview of how it works?</span><br><br><b><strong style="white-space: pre-wrap;">VIBHAV</strong></b><span style="white-space: pre-wrap;">:&#xA0;Yeah, I&apos;d love to. So just to kind of understand where we&apos;re coming from, I&apos;ll give a little backstory of Shield, and talk a little bit about how I evolved in it, and then how Shield has evolved over that that time as well.</span><br><br><b><strong style="white-space: pre-wrap;">So the entire existence of Shield,&#xA0;our mission has been to protect serve, members and civilians using intelligent systems. </strong></b><span style="white-space: pre-wrap;">And we do that by providing&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">platforms</span></u><span style="white-space: pre-wrap;">&#xA0;that are capable of operating&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">at the edge</span></u><span style="white-space: pre-wrap;">, providing&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">software</span></u><span style="white-space: pre-wrap;">&#xA0;that allows different kinds of platforms to be resilient to comms and GPS denial and operate in a really, really sticky and dangerous environments.</span><br><br><b><strong style="white-space: pre-wrap;">And we believe the greatest victory requires no war, </strong></b><span style="white-space: pre-wrap;">and we achieve this by equipping the US and its allies with the ability to see and act anywhere at any time.</span><br><br><span style="white-space: pre-wrap;">That started back in 2016/2015 in very niche ConOps, specifically indoor ConOps. There, we&apos;re focused on kind of building&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">clearance.</span></u><br><br><b><strong style="white-space: pre-wrap;">What our founder, Brandon came back from his deployments and saw as kind of lack of technology really servicing the members that were protecting us</strong></b><span style="white-space: pre-wrap;">, and&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">particularly in areas where they were going in kind of blind to buildings</span></u><span style="white-space: pre-wrap;">, if you can imagine you had, you know, in the Middle East conflicts, there were just these buildings that were there. You had no idea what&apos;s happening inside them.</span><br><br><b><strong style="white-space: pre-wrap;">And in order to kind of under make sure the city was safe</strong></b><span style="white-space: pre-wrap;">, you have to go inside and verify that there was no explosives or militants in there. And what they used to do was send people through this because there was no robots or technology capable to do that.</span><br><br><span style="white-space: pre-wrap;">And if you can imagine yourself doing that, it&apos;s extremely scary going in blind, not knowing what&apos;s going to happen, what&apos;s going to be on the other side of that door.</span><br><br><span style="white-space: pre-wrap;">And so what he wanted to create is a system that could do that for the operator, instead of having the person go do that.</span></p><p><b><strong style="white-space: pre-wrap;">And so the quadcopter Nova 1 was born out of that idea of, how do you provide information before you send a person through?</strong></b><br><br><span style="white-space: pre-wrap;">And the goal there wasn&apos;t necessarily to build a quadcopter, but it was just the first apple. Just the first application of autonomy in the defense space that was very, very tangible and very easy for us to apply ourselves to. And so we just designed and built a state of the art indoor autonomous surveillance device.</span><br><br><b><strong style="white-space: pre-wrap;">Nova one was ahead of its league in many different areas.</strong></b><span style="white-space: pre-wrap;"> One of the things that really stuck out to me from coming from academia, before I was doing a master&apos;s in robotics at CMU, and we saw a lot of really cool applications of autonomy there too, but the hardware systems were not that capable.&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">Like the flight time was 3 to 5 minutes.</span></u><span style="white-space: pre-wrap;">&#xA0;The processing was very slow.&#xA0;You could operate very slowly.&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">Most of these videos you were seeing back then were sped up by 8x or 7x just to make sure they look compelling</span></u><span style="white-space: pre-wrap;">.</span><br><br><span style="white-space: pre-wrap;">But what Shield had accomplished was real-time exploration at staggering speeds.&#xA0;At one point, we did a comparison of how fast can a quadcopter clear an environment compared to six Navy SEALs, and the quadcopter actually finished in a third of the time compared to those.</span><br><br><b><strong style="white-space: pre-wrap;">JEREMY</strong></b><span style="white-space: pre-wrap;">: Wow! Okay, I see!</span><br><br><b><strong style="white-space: pre-wrap;">VIBHAV</strong></b><span style="white-space: pre-wrap;">: Isn&apos;t that crazy, just how fast this thing was operating. And back then we were, you know, at a limited, limited sensor suite. So we had a&#xA0;2D scan LiDAR, we had a&#xA0;camera, we had some&#xA0;sonars&#xA0;and an&#xA0;Intel Neural Compute Stick. So it was very limited hardware back then, because it&apos;s 2017 but was able to actually accomplish this mission.</span><br><br><b><strong style="white-space: pre-wrap;">So as long as there was a window or door for to fly in</strong></b><span style="white-space: pre-wrap;">, a human operator which would enter, it would enter the vicinity and kind of say, this is the building I want to enter. And from then on, it would be fully autonomous, no comps required. It would find an entrance.</span></p></div>
        </div><p><strong>Impressive, isn&apos;t it? </strong>What I really love about it is that it&apos;s <u>down to earth.</u><strong> </strong>I could see myself assembling a drone kit,&#xA0;adding a camera, a 2D LiDAR, and starting experimenting with Visual SLAM projects to map a room. This is basically what Shield AI did, when they got started. Except that their drone was (1) targetted to a specific client and (2) better than all competition.</p><p>There are 2/3 insights I&apos;d like to share with you, from Vibhav:</p><h3 id="1-self-driving-car-autonomous-transfer-doesnt-work-as-wed-think">1) Self-Driving Car &gt; Autonomous Transfer doesn&apos;t work as we&apos;d think</h3><p>Now, here is something important to note:</p><blockquote class="kg-blockquote-alt"><strong>A lot of what you learn in autonomous robots CANNOT simply be transferred to drones.</strong></blockquote><p><strong>I did think that it was a matter of copy and paste</strong>. But I understood I got it wrong when making this fragment, especially with Vibhav Ganesh who told me that their&#xA0;drones don&apos;t have LiDARs, fly over seas, or over deserts, no man&apos;s land zones, between mountains, campaigns, with 3D constraints, and no map!</p><blockquote>I thought about it for a minute, and I realized...&#xA0;<strong>&quot;Wait, it&apos;s absolutely NOT like autonomous cars!&quot;</strong></blockquote><p><strong>And in fact, when you start looking into autonomous drone architecture</strong>, they absolutely don&apos;t look like self-driving car architectures! For example with Shield AI, they have a Control, a Station, RTOS, a ViDAR, but also a Flight Controller powered with frameworks like PX4 and Maven. This is an entire set of libraries to learn.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/unnamed.jpg" class="kg-image" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones" loading="lazy" width="720" height="405" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/11/unnamed.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/11/unnamed.jpg 720w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The external architecture of Shield AI&apos;s components</span></figcaption></figure><p>From there, here is a second insight:</p><h3 id="2-visual-slam-is-mostly-used">2) Visual SLAM is mostly used</h3><p>Coming back to the idea that we are NOT like self-driving cars... the other main difference is that we use no map. So without a map, and with just a camera, they have no choice but to use...<strong>Visual SLAM!</strong> </p><p>And Vibhav explains really well what kind of SLAM they&apos;re using, how they implement the mapping, even though there is 0 starting point, and so on. Here is a sample of a vSLAM project I&apos;ve tested with drones:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/68747470733a2f2f692e696d6775722e636f6d2f554b4c7444374c2e676966-ezgif.com-optimize.gif" class="kg-image" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones" loading="lazy" width="600" height="337" srcset="https://www.thinkautonomous.ai/blog/content/images/2025/11/68747470733a2f2f692e696d6775722e636f6d2f554b4c7444374c2e676966-ezgif.com-optimize.gif 600w"></figure><p>This is the nitty gritty of Shield AI&apos;s work. Once you build a SLAM MAP, you can then feed that map to the Motion Planner, which sends a flight order to the drone. If you&apos;d like more insights on this technology, I highly recommend my <a href="https://www.thinkautonomous.ai/blog/visual-slam/" rel="noreferrer">vSLAM</a> article.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/visual-slam"><div class="kg-bookmark-content"><div class="kg-bookmark-title">The 6 Components of a Visual SLAM Algorithm</div><div class="kg-bookmark-description">How does Visual SLAM work? How is it different from normal SLAM? What are the 6 main steps of a Visual SLAM system? Let&#x2019;s find out!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2024/03/visual-slam.jpg" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div></a></figure><p>Okay, would you like to see some samples?</p><h2 id="shield-ai-in-action">Shield AI in Action</h2><p>Let&apos;s take a look at 3 samples here:</p><figure class="kg-card kg-gallery-card kg-width-wide kg-card-hascaption"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/ScreenRecording2025-07-24at00.09.54-ezgif.com-optimize.gif" width="496" height="294" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/ScreenRecording2025-07-22at12.46.33-ezgif.com-optimize-1.gif" width="400" height="225" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/f10teaser-ezgif.com-optimize.gif" width="480" height="270" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div></div></div><figcaption><p><span style="white-space: pre-wrap;">courtesy of </span><a href="https://www.shield.ai/" rel="noreferrer"><b><strong style="white-space: pre-wrap;">Shield</strong></b> <b><strong style="white-space: pre-wrap;">AI</strong></b></a></p></figcaption></figure><ul><li><strong>On the left, you can see drones being launched</strong>. These drones cannot do the &quot;vertical takeoff and landing&quot;. They are projected to the air by launchers and then fly like a plane. </li><li><strong>In the middle, you can see the tactical quadcopters we discussed</strong>. Notice how they use vSLAM at the end of the shot.</li><li><strong>On the right, you can see a mission of the v-BAT </strong>searching for a vessel in a sea canal.</li></ul><p>This is really cutting-edge, and totally applied what we are building in the autonomous tech space. Alright, time to wrap up!</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>The defense industry is extremely active</strong>. Hundreds of companies work on the autonomous generations of drones, anti-missile detectors, infiltration equipment, RADARs, and more...</li><li><strong>Shield AI is an active actor of the defence space,</strong> with a range of multiple products, such as the v-BAT, the ViDAR, and HiveMind.</li><li><strong>Shield AI started with NOVA 1,</strong> a quadcopter that could infiltrate into buildings, builds maps, surveys, without the need to send humans into the region. This allowed to prevent human losses due to buildings collapsing or being trapped.</li><li><strong>Besides from being safer, infiltration drones are also more efficient</strong>. Shield AI tested their drone against 6 NAVY SEALs clearing a building, and it finished in a third of the time.</li><li><strong>The transfer of autonomous car/robot technology to autonomous drone isn&apos;t as simple as we&apos;d think</strong>. Architectures are different, products are different, regions/environments are differents, and even the inside technologies and algorithms change.</li><li> On the other hand, some technologies really apply well to autonomous drones,  such as <a href="https://www.thinkautonomous.ai/blog/visual-slam/" rel="noreferrer">Visual SLAM.</a></li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interested in these interviews?</strong></b> I am bringing new guests to my membership every single month, and when you join my daily emails, you can not only be aware of when these interviews get released, you can also get the opportunity to access the complete training we build for them inside our membership.<br><br>If you&apos;d like to get started, <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">you can receive the emails here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)]]></title><description><![CDATA[Medical Image Segmentation is one of the most important applications of Deep Learning in healthcare. Yet, most people only know 2D check x-ray segmentation. What about the 3D Scans? What about Foundation Models?

In this article, we're going to dive into it!]]></description><link>https://www.thinkautonomous.ai/blog/medical-image-segmentation/</link><guid isPermaLink="false">67d05c36c8f3bf93bd1872ad</guid><category><![CDATA[deep learning]]></category><category><![CDATA[computer vision]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 12 Mar 2025 11:58:06 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/03/medical-image-segmentation-1.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/medical-image-segmentation-1.webp" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)"><p><strong>On September 23, 1999, NASA&#x2019;s Mars Climate Orbiter&#x2014;a $125 million spacecraft</strong>&#x2014;was set to enter Mars&apos; orbit to study its climate and atmosphere. But just as it approached the planet, something went terribly wrong. Instead of entering a stable orbit, the spacecraft plunged into Mars&#x2019; atmosphere and was destroyed.</p><p><strong>After analysis, NASA found a unit mismatch: their Jet Propulser used metric units (newtons), while the spacecraft they got from Lockheed Martin used imperial units (pound-force). </strong>This caused navigation errors, making the spacecraft descend far too low into the Martian atmosphere; and causing a 125m$ loss.</p><p><strong>Human errors happen every day in all sorts of domains</strong>. In 2016, an alarming report from Johns Hopkins estimated that medical errors (including misdiagnoses) cause over 250,000 deaths annually in the U.S., making them the third leading cause of death.<strong> </strong>Many are due to errors in analysis of medical images, such as MRIs, X-Rays, CT Stans, and more.</p><p>In this article, I would like to show you how Medical <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer">Image Segmentation</a> can be used to counter this problem, and I&apos;ll do it in 3 points:</p><ol><li>2D Medical Image Segmentation</li><li>3D Medical Image Segmentation</li><li>Examples/Demo</li></ol><p>Let&apos;s get started...</p><h2 id="intro-to-2d-medical-image-segmentation">Intro to 2D Medical Image Segmentation</h2><p><strong>In 2019, I hosted the biggest AI Healthcare hackathon ever held</strong>,<strong> happening simultaneously over 20 cities!</strong> The goal at the time was to mix companies, healthcare groups, and engineers to build healthcare solutions using Deep Learning. After the 48 hours of coding, the winning team would win <strong>10,000 USD</strong>, the second <strong>4,000 USD</strong>, and then team 3, 4, 5, and 6 would win <strong>2,500 USD each</strong>!</p><p><strong>Great computer vision projects happened, </strong>and in fact, Paris (my city) finished the competition #2 via <a href="https://www.spotimplant.com/en/" rel="noopener noreferrer"><strong>Spot Implant</strong></a><strong>, </strong>a Shazam for Tooth Implants project that then became a startup. At the time, everybody was working on 2D Images. We had projects like Skin Melanoma detection, X-Ray segmentation, Brain Segmentation, and more...</p><p>Let me show you a few <u>tasks</u> in Medical Image Segmentation, and then we&apos;ll look at <u>algorithms</u>.</p><h3 id="2d-medical-image-segmentation-tasks">2D Medical Image Segmentation Tasks</h3><h4 id="x-ray-the-most-common">X-Ray (the most common)</h4><p><strong>First, we have X-Rays. X-Rays are the 2D representation of a body. </strong>We often see bones and organs there, and it&apos;s the most common image you&apos;ll find in Deep Learning x Healthcare. Using medical image segmentation, we can assist doctors in finding <u>bone fractures,</u> <u>lung diseases</u>, and other abnormalities. It can also help in screening large volumes of X-rays for <u>tuberculosis</u>, which is particularly useful in low-income countries with limited access to radiologists.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1182" height="384" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 1182w" sizes="(min-width: 720px) 720px"></figure><p>This really is the most known among Deep Learning Engineers. I would like to show you other applications of segmentation...</p><h4 id="dermoscopy-segmentation-skin-lesion-segmentation">Dermoscopy Segmentation (skin lesion segmentation)</h4><p><strong>Dermoscopy segmentation was the health hackathon&apos;s top pick</strong>. It&apos;s all about using medical image segmentation to spot and separate skin lesions in dermoscopic images. By applying deep learning on medical images, we can quickly and accurately detect skin conditions like melanoma. This helps dermatologists diagnose and treat patients faster and manage large amounts of data more efficiently.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1182" height="384" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 1182w" sizes="(min-width: 720px) 720px"></figure><p>Let&apos;s see one or two more...</p><h4 id="mammography-segmentation">Mammography Segmentation</h4><p><strong>Mammograms are specialized X-ray images designed to reveal the inner structure of breast tissue. </strong>These images typically come in a flat, 2D format, capturing the breast from multiple angles to ensure a comprehensive view. The details in mammograms can show everything from dense tissue patterns to potential abnormalities like lumps or calcifications.</p><p><strong>Look at the image below: see how the role of a doctor/radiologist is to find these highlighted areas</strong>. The role of image segmentation (in this case <a href="https://www.thinkautonomous.ai/blog/instance-segmentation/" rel="noreferrer">instance segmentation</a>) is to assist the doctor, so he&apos;s not alone doing that high stake task of spotting problems (of course, it goes without saying that doctors also do much more than spotting, from understanding how bad a calcification can be, to finding the treatment, and so on...).</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="909" height="427" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg 909w" sizes="(min-width: 720px) 720px"></figure><h4 id="other-types-ultrasound-%F0%9F%91%B6%F0%9F%8F%BD-endoscopy-%F0%9F%A4%A2-and-more">Other Types: Ultrasound &#x1F476;&#x1F3FD;, Endoscopy &#x1F922;, and more...</h4><p>We just saw 3 types: X-Rays, Dermoscopy, and Mammography. There are other types, such as ultrasound images (baby for examples), which can be 2D or 3D; or endoscopy, and more... The image below shows many 2D segmentation applications:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="435" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>So how do you build the segmentation results? What do you use? Let&apos;s take a look...</p><h3 id="2d-medical-image-segmentation-models">2D Medical Image Segmentation Models</h3><p><strong>Ever heard of UNet? </strong>You know, that 2015 model subtitled &quot;Convolutional Networks for Biomedical Image Segmentation&quot;. Well, it may be from 2015, but it&apos;s a great way to start! In fact, there have been lots of improvements of <a href="https://arxiv.org/pdf/1505.04597" rel="noopener noreferrer"><strong>UNet</strong></a>, to <a href="https://arxiv.org/pdf/1807.10165v1" rel="noopener noreferrer">UNet++,</a> <a href="https://arxiv.org/pdf/2102.04306" rel="noopener noreferrer">Trans-UNet</a>, <a href="https://arxiv.org/pdf/2105.05537" rel="noopener noreferrer">Swin-UNet,</a> all keeping that &quot;U&quot; shape, but using different pattern recognition techniques like Swin Transformers, CNNs, etc...</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/UNEt-Family.001.jpeg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/UNEt-Family.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/UNEt-Family.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/UNEt-Family.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/UNEt-Family.001.jpeg 1920w" sizes="(min-width: 720px) 720px"></figure><p>This is one family of semantic image segmentation algorithms, and here is what the results look like:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/image--1---1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="966" height="770" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/image--1---1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/image--1---1-.jpg 966w" sizes="(min-width: 720px) 720px"></figure><p>To get true numbers, Dice-Similarity coefficient (DSC) and average Hausdorff Distance (HD) are used as evaluation metric to evaluate these algorithms.</p><p><strong>These are great, but what happens when you don&apos;t have millions of labeled data?</strong> In healthcare, getting access to labeled, free-to-use data isn&apos;t easy; especially for certain types of diseases that are specific to certain hospitals, and so on... In these cases, you can use more &quot;foundational&quot; semantic segmentation models such as <strong>SAM (Segment Anything) or SAM2</strong>. These have been trained using Self-Supervised Learning on &quot;the entire internet&quot;, and thus are supposed to find problems better.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1324" height="846" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1324w" sizes="(min-width: 720px) 720px"></figure><p><strong>For example, </strong><a href="https://github.com/bowang-lab/MedSAM" rel="noopener noreferrer"><strong>MEDSAM</strong></a><strong> is a Medical SAM (Segment Anything) is what I used for the images above</strong>. It&apos;s the regular SAM, but tweaked for medical image segmentation, to boost the segmentation performances. The model performance is quite high, and we get a top notch <a href="https://www.thinkautonomous.ai/blog/computer-vision-applications-in-self-driving-cars/" rel="noopener noreferrer">Computer Vision</a> project using image segmentation... It can can even take your prompt as a region of interest bounding box to return the segmented masks:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="259" height="244"></figure><p>So this is for the first part on 2D images... Now what are 3D images?</p><h2 id="3d-medical-image-segmentation-ct-scans-mris">3D Medical Image Segmentation: CT Scans &amp; MRIs</h2><p>Now come 3D images! For this part, I&apos;ll talk about the two use cases (CT Scans &amp; MRIs) and discuss the algorithms together.</p><h3 id="ct-scans-use-cases-algorithms">CT Scans: Use Cases &amp; Algorithms</h3><h4 id="use-cases-for-ct-scans-3d-representation">Use Cases for CT Scans &amp; 3D Representation</h4><p>In the <a href="https://flare22.grand-challenge.org/Dataset/" rel="noopener noreferrer"><strong>FLARE 2022 dataset</strong></a> (Fast and Low-resource semi-supervised Abdominal oRgan sEgmentation), we get access to a few hundred labeled and unlabeled cases with liver, kidney, spleen, or pancreas diseases as well as examples of uterine corpus endometrial, urothelial bladder, stomach, sarcomas, or ovarian diseases.</p><p>Hey, relax. I&apos;m just scaring you. I didn&apos;t get a clue of what that meant either. Except that:</p><p><strong>These are <u>CT SCANS </u>(Computed Tomography Scans)</strong>. A CT scan uses X-rays to create detailed, cross-sectional (slice-by-slice) images of the inside of the body. They&apos;re more detailed than traditional X-rays because they produce 3D images by taking multiple X-ray images from different angles and combining them using a computer.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/20220309-FLARE22-Pictures-2.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1254" height="780" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/20220309-FLARE22-Pictures-2.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/20220309-FLARE22-Pictures-2.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/20220309-FLARE22-Pictures-2.jpg 1254w" sizes="(min-width: 720px) 720px"></figure><p><strong>So what&apos;s the &quot;3D&quot; output like? </strong><a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer"><strong>Voxels</strong></a><strong>? </strong><a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noopener noreferrer"><strong>Point Clouds</strong></a><strong>? </strong>Not exactly. As I said, these are images taken with multiple &quot;layers&quot; (or dimensions). So your input image dimension isn&apos;t (512, 512, 3) but (512, 512, 129) or something like this. You have a multi-dimensional image on which you can apply image segmentation to each of the 2D slices:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="836" height="418" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg 836w" sizes="(min-width: 720px) 720px"></figure><p><strong>In this example, I used MedSAM to process individual 2D images.</strong> If you do it on the entire 3D CT Scan, you get something like this:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="396" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>If you get it, you understand that from these images, we can put that into a software that is going to reconstruct the scan to 3D:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1638" height="1088" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1638w" sizes="(min-width: 720px) 720px"></figure><p>From there, people go absolutely nuts and even try to make it into a point cloud (I&apos;m not sure why, but this is cool, shoutout to <a href="https://www.youtube.com/watch?v=3apDWJWe_jg" rel="noopener noreferrer">Beau Seymour&apos;s video</a>).</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at19.17.27-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="560" height="401"></figure><h3 id="mri-scans-advanced-medical-image-computing">MRI Scans: Advanced Medical Image Computing</h3><p><strong>Magnetic Resonance Imaging (MRI) Scans are another powerful tool in medical imaging.</strong> Unlike CT scans, MRIs use powerful magnets and radio waves to create detailed images of organs and tissues within the body. This technique is particularly great for soft tissue contrast, making it ideal for brain, spinal cord, and joint imaging. By leveraging medical image segmentation, MRI scans can aid in the precise identification of tumors, neurological disorders, and musculoskeletal issues.</p><p><strong>Here is an example of MRI scan and its segmentation task:</strong></p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="591" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>So now, let&apos;s see how to process that...</p><h3 id="algorithms-in-the-3d-medical-image-segmentation-domain">Algorithms in the 3D Medical Image Segmentation Domain</h3><p>We already discussed SAM (Segment Anything) and how it can work on individual slices. The reality is, medical image segmentation involves a lot of complex &quot;job&quot; knowledge; and it would probably be better to use a specialized artificial intelligence model for optimal model performance. Today, in AI, we have two types of models:</p><ul><li>Foundation Models, that are very general and know everything</li><li>Specific &amp; Labeled Models, that can only process the images it&apos;s been trained on</li></ul><p>I would like to show you two models doing both: TotalSegmentor &amp; Vista-3D.</p><h4 id="total-segmentator-a-specific-model-for-2d-and-3d-segmentation">Total Segmentator: A specific model for 2D and 3D Segmentation</h4><p>Perhaps one of the most used and well known &quot;framework&quot; for image segmentation of both 2D and 3D data is<strong> </strong><a href="https://arxiv.org/pdf/2208.05868" rel="noopener noreferrer"><strong>TotalSegmentator</strong></a>. Rather than being a simple machine learning model, it&apos;s a complete framework that does the automatic labelling.</p><p>The number of classes for CT and MRI data it can segment is gigantic:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/overview_classes_v2--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="2000" height="1127" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/overview_classes_v2--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/overview_classes_v2--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/overview_classes_v2--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/overview_classes_v2--1-.jpg 2388w" sizes="(min-width: 720px) 720px"></figure><p>And the model is based on the <a href="https://arxiv.org/pdf/1809.10486" rel="noopener noreferrer"><strong>nn-UNet architecture</strong></a><strong>,</strong> which is similar to UNet, but can also take in different medical imaging modalities.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/nnU-Net_overview--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1392" height="1065" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/nnU-Net_overview--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/nnU-Net_overview--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/nnU-Net_overview--1-.jpg 1392w" sizes="(min-width: 720px) 720px"></figure><h4 id="vista-3d-foundation-model-for-3d-medical-image-segmentation">VISTA-3D: Foundation Model for 3D Medical Image Segmentation</h4><p><strong>VISTA-3D</strong> is a 2024 &quot;Foundation model&quot; from Nvidia that works on the 3D patch directly. While being named &quot;foundation&quot; model, it&apos;s incredibly specific to the medical image segmentation tasks. Here, we are PURELY in <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer">3D Deep Learning</a>.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1234" height="622" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 1234w" sizes="(min-width: 720px) 720px"></figure><p>So we&apos;ve seen a lot:</p><ul><li>2D Segmentation can be done with models like UNet, UNet++, etc... (specific), or SAM (foundation)</li><li>3D Segmentation can be done with models like nnUNet/TotalSegmentator (specific), or Vista-3D &amp; SAM (foundation)</li></ul><p>Let&apos;s see examples now...</p><h2 id="example-1-ct-scan-segmentation-with-vista-3d">Example 1: CT Scan Segmentation with Vista-3D</h2><p>In <a href="https://build.nvidia.com/nvidia/vista-3d" rel="noopener noreferrer"><strong>this platform</strong></a><strong> from Nvidia</strong>, I am able to select a CT Scan and call Vista-3D to process it.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ezgif.com-optimize--1-.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="453" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ezgif.com-optimize--1-.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ezgif.com-optimize--1-.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>Notice how we can select an Abdomen, and then pick all the organs we want to segment. Finally, we can get the view from 3 different &quot;angles&quot; and process that too!</p><h2 id="example-2-ct-scan-segmentation-with-totalsegmentator">Example 2: CT Scan Segmentation with TotalSegmentator</h2><p>On <a href="https://totalsegmentator.com/" rel="noopener noreferrer"><strong>totalsegmentator.com,</strong></a> we can upload images, and ask for complete segmentation. Here, I am going to upload a scan from the FLARE2022 dataset I mentioned above. The platform return hundreds of organs all in a weird format &apos;nii.gz&apos; format:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="2000" height="1854" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 2274w" sizes="(min-width: 720px) 720px"></figure><p>I can visualize some of these, and see what the output is like:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1800" height="400" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1800w" sizes="(min-width: 720px) 720px"></figure><p>Alright! So this is our second example, and both have playable demos! Let&apos;s now do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Medical image segmentation helps reduce human errors</strong> by processing 2D and 3D medical images like MRIs and X-rays.</li><li><strong>2D medical image segmentation tasks </strong>include X-Rays, dermoscopy (skin lesion analysis), endoscopy, mammography segmentation (breast), and more...</li><li><strong>UNet and its variants are popular models for 2D medical image analysis,</strong> utilizing CNNs or Transformer approaches. Foundation models like SAM (Segment Anything Model) can also be fine-tuned on medical images, like with MedSam.</li><li><strong>3D medical image segmentation involves CT(computed tomography) and MRI (magnetic resonance imaging) scans</strong>. They&apos;re called 3D images because they&apos;re multiple slices of the same image under different views.</li><li><strong>MedSAM can process 2D slices of 3D scans</strong>, allowing individual segmentation of each slice. We can then fit that into a software that will do a reconstruction into a complete 3D image.</li><li><strong>For 3D processing, TotalSegmentator and Vista-3D are solid solutions,</strong> being either specific or foundation based.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Next Step?</strong></b><br>Receive my Daily Emails, and get continuous training on Computer Vision &amp; Autonomous Tech. Each day, you&apos;ll receive one new email, sharing some information from the field, whether it&apos;s a technical content, a story from the inside, or tips to break into this world; we got you.<br><br><a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">You can receive the emails here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[Video Segmentation: Why the shift from image to video processing is essential in Computer Vision]]></title><description><![CDATA[<p><strong>In 1897, French police faced a difficult problem:</strong> a serial killer named Joseph Vacher was stealing and murdering sheperds, and remained impossible to catch. Every time he was arrested, he gave a different name, changed his appearance, used fake mustaches, wigs, and different clothing styles... and got to disappear without</p>]]></description><link>https://www.thinkautonomous.ai/blog/video-segmentation/</link><guid isPermaLink="false">67b46fe9eaa12c28321be825</guid><category><![CDATA[computer vision]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 18 Feb 2025 15:51:15 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/02/video-segmentation.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/video-segmentation.jpeg" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision"><p><strong>In 1897, French police faced a difficult problem:</strong> a serial killer named Joseph Vacher was stealing and murdering sheperds, and remained impossible to catch. Every time he was arrested, he gave a different name, changed his appearance, used fake mustaches, wigs, and different clothing styles... and got to disappear without the police realizing they just controlled France&apos;s most wanted man.</p><p><strong>At the time, France had no national ID system</strong>, <strong>and no way to prove that the man they caught today was the same man they arrested months ago</strong>. That was until an officer named Alphonse<strong> </strong>Bertillon introduced a revolutionary method: <u>anthropometry</u>. It&apos;s a system that labeled criminals based on of 12 unchangeable physical measurements like ear shapes, skull sizes, and limb lengths, that could not be faked.</p><p><strong>One day, Vacher was caught for attacking a woman, and this time, the police used Bertillon&apos;s system to compare his measurements to what they had in their records</strong>: they discovered they just caught France&apos;s most wanted criminal. This time, he could not escape with a warning, and got sent to... yeah &#x2014; the guillotine &#x1F937;&#x1F3FB;&#x200D;&#x2642;&#xFE0F;&#x1F1EB;&#x1F1F7;</p><p>What got Vacher executed wasn&#x2019;t just this one-time capture, but <strong>the ability to analyze a series of events and not just a one-time event.</strong> And this is exactly what this article is about: the shift from frame-by-frame to sequence processing, here in Computer Vision with videos. And this is done via something called <strong>video segmentation.</strong></p><p>So let&apos;s get started:</p><h2 id="what-is-video-segmentation">What is Video Segmentation?</h2><p><strong>Most Computer Vision Engineers spend time learning about image processing,</strong> <strong>but never consider what happens when you use a video.</strong> Yet, tons of architectures today, whether in surveillance, retail, sports analysis, healthcare, or even robotics and self-driving cars &#x2014;&#xA0;now process videos instead of images. The sequence brings something individual images don&apos;t, just like the Vacher story, where he was able to get judged through all the murders he committed.</p><p><strong>So let&apos;s take a less deadly scene &#x2014;&#xA0;shoplifting detection in retail</strong>. There is a startup I once interviewed for named <a href="https://www.veesion.io" rel="noopener noreferrer"><strong>Veesion</strong></a> &#x2014;&#xA0;that has this amazing video on their homepage:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="322" height="322"><figcaption><span style="white-space: pre-wrap;">Shoplifting Demo by Veesion</span></figcaption></figure><p><strong>Can you see everything happening here?</strong></p><ul><li>We have the <a href="https://www.thinkautonomous.ai/blog/object-tracking/" rel="noopener noreferrer"><strong>object tracking</strong></a><strong> </strong>(the second man is moving from aisle 1 to aisle 2)</li><li>The <strong>event</strong> <strong>detection</strong> (at 00:03, a man puts an item in a pocket)</li><li>The <strong>action</strong> <strong>classification</strong> (of putting something in a pocket)</li><li>The <strong>video</strong> <strong>decomposition</strong> (shoplifting from 00:02 to 00:03 &#x2014;&#xA0;standing from 00:03 to 00:06)</li><li>The <strong>people</strong> <strong>counting</strong> (2 people in the video, one is obstructing the other)</li><li>And more...</li></ul><p>Among these, there is the idea of &quot;<u>segmenting</u>&quot; the scene to track the shoplifters through the video. You can see the hands being in red, consistently from frame to frame. So this is the idea of Video Processing, and Video Segmentation is a sub-branch of it focus on the task of segmenting a scene.</p><p><strong>There are two types of Video Segmentation tasks:</strong></p><ul><li>Video <strong><u>Object</u></strong> Segmentation (VOS)</li><li>Video <strong><u>Semantic</u></strong> Segmentation (VSS)</li></ul><h3 id="video-object-segmentation">Video Object Segmentation</h3><p><strong>In Video Object Segmentation, we are doing exactly what I did in this video</strong>. I define an object to track, send the video to the model, which tracks the object consistently across frames. It&apos;s purely &quot;object&quot; based, and is NOT used in a supervised way. For example, you can use semi-supervised video object segmentation, where you define an object on Frame 1, and let the model track it across the next frames... Or you can use totally unsupervised video object segmentation, where you won&apos;t even mention the objects to track.</p><p>Let me show you an example where I am shoplifting (muahahah):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Video Object Segmentation tracks objects consistently over time</span></figcaption></figure><p>See? We are able to track my head &amp; hands in blue, and the phone in yellow! That is the idea we&apos;re interested in... And even more when we can do this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Floating head demo for no other purpose than getting you out of your exhausting boredom at work</span></figcaption></figure><p>Now, to be fair, the floating head experiment may NOT be the most useful thing in this example, but the stolen phone is. Now think of everything we can keep track of cells in health related videos, we can keep track of a player when analysing a football match, and a lot more...</p><h3 id="video-semantic-segmentation">Video Semantic Segmentation</h3><p><strong>In Video Semantic Segmentation, we&apos;ll really go at the pixel level, and rather than focusing on segmenting objects, we focus on the scene</strong>. The output is going to look extremely similar to a normal image segmentation task.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at13.47.29-ezgif.com-optimize.gif" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">Video Semantic Segmentation</span></figcaption></figure><p>Just like image segmentation, you can also use video <a href="https://www.thinkautonomous.ai/blog/instance-segmentation/" rel="noopener noreferrer"><strong>instance segmentation</strong></a>, video panoptic segmentation, video semantic segmentation, and so on... And of course, there is the benefit of doing background extraction, to then process uniquely what&apos;s been segmented, for example in a case like <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer"><strong>lane detection</strong></a><strong> </strong>in self-driving cars:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="632" height="302" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif 632w"><figcaption><span style="white-space: pre-wrap;">What could we do with this lane line information? Or these cars?</span></figcaption></figure><p>But from now, you may have a question:</p><h2 id="how-is-video-segmentation-different-than-image-segmentation">How is Video Segmentation different than Image Segmentation?</h2><p><strong>I mean, is it really different?</strong> It kinda looks similar to. image segmentation, right? And yes, while it may be the case for some examples, like the one I just gave with video semantic segmentation, most of the tasks will be different and give different outputs.</p><p><strong>To put it simply: Video Segmentation is about processing videos.</strong> You don&apos;t process image per image, you process video frames immediately. And this has several advantages:</p><ul><li><strong>The model can track multiple objects</strong> even though they&apos;re occluded (similar to what object tracking would do, but using video sequences)</li><li><strong>The model can segment specific scenes</strong> you&apos;re looking for (a blood cell changing sizes, a car entering a scene, a man stealing something)</li><li><strong>It ensures temporal consistency</strong>, meaning an object that appears in one frame keeps the same identity/color across the entire video, enabling tracking at the same time.</li><li><strong>It understands object motion</strong>, meaning it can predict where an object will be in the next frame instead of treating every frame as an isolated image (thanks mainly to video instance segmentation)</li><li><strong>For some models, it can be more efficient</strong>, since instead of running image segmentation on each frame separately, the model processes a video sequence, leveraging temporal information to process frames together, reducing redundant computations.</li></ul><p><strong>So, how does that work?</strong> What type of model does this? I do NOT have a specific &quot;do this do that&quot; template to share with you, but by studying examples, we could probably understand what&apos;s required to make a Video Segmentation algorithm work...</p><h2 id="example-1-vistr-video-instance-segmentation-transformer">Example 1: <strong>VisTR (Video Instance Segmentation Transformer)</strong></h2><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1114" height="504" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 1114w" sizes="(min-width: 720px) 720px"><figcaption><i><em class="italic" style="white-space: pre-wrap;">Video Instance Segmentation with Transformers (</em></i><a href="https://openaccess.thecvf.com//content/CVPR2021/papers/Wang_End-to-End_Video_Instance_Segmentation_With_Transformers_CVPR_2021_paper.pdf" target="_blank" rel="noopener noreferrer"><i><b><strong class="italic" style="white-space: pre-wrap;">source</strong></b></i></a><i><em class="italic" style="white-space: pre-wrap;">)</em></i></figcaption></figure><p>The first paper looks terribly simple. Let&apos;s try to understand the different blocks:</p><ul><li><strong>Input</strong>: First, we process raw video data, it&apos;s purely a sequence of images sent to the CNN</li><li><strong>Backbone</strong>: Then a normal 2D CNN processes each frame independently before concatenating the feature maps</li><li><strong>Video Processing:</strong> This is fed to a Transformer, known to process sequences quite well. However, we modify this transformer a bit to not just receive a positional encoding, but also a <em>temporal encoding</em>.</li><li><strong>Output:</strong> Finally, the output of the decoder predicts instances for each pixel, with a sequence matching strategy</li></ul><p>The training is done after obtaining labeled data from the <a href="https://youtube-vos.org/dataset/vis/" rel="noopener noreferrer"><strong>YoutubeVIS dataset</strong></a>, and the backbone is initialized with the weights of<strong> </strong><a href="https://arxiv.org/abs/2005.12872" rel="noopener noreferrer"><strong>DETR</strong></a>.</p><p>The detailed version looks like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1756" height="614" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1756w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">VisTR detailed</span></figcaption></figure><p>As you can see, we have a video processing pipeline, where the transformer is actually aware of the frames. The segmentation process ends by matching pixels with instances. This is done using Bipartite Matching (<a href="https://www.thinkautonomous.ai/blog/hungarian-algorithm/" rel="noopener noreferrer"><strong>the Hungarian Algorithm</strong></a>). More subtle blocks exist, and I invite you to read the paper for more...</p><h2 id="example-2-sam-2-segment-anything-2">Example 2: SAM 2 (Segment Anything 2)</h2><p>If you didn&apos;t live in a cave around 2023, you probably heard of Segment Anything&#xA0;&#x2014; the segmentation model that could find <strong><em>any</em></strong> object in an image. Recently, it got an upgraded version called <a href="https://scontent.fcdg3-1.fna.fbcdn.net/v/t39.2365-6/464917098_581932941165933_4465312900778079623_n.pdf?_nc_cat=105&amp;ccb=1-7&amp;_nc_sid=3c67a6&amp;_nc_ohc=Mn0M6N9O9K4Q7kNvgHsDXZ8&amp;_nc_oc=AdiskhA1_LoHfyJs-eCrqi0Ff4_AhWlmF71ArIj0MOtfkVFvl0S3CBlghheMqNnFj7A&amp;_nc_zt=14&amp;_nc_ht=scontent.fcdg3-1.fna&amp;_nc_gid=AowO5fmUshA8NSDSOp9SkAs&amp;oh=00_AYDPLXLOi0edVnOB48aBIjiWzvYPIFrIwWkimA0rxel2Dg&amp;oe=67BA6932" rel="noopener noreferrer"><strong>SAM2</strong></a>, which is designed to process videos. Let&apos;s take a look:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1324" height="846" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1324w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Image vs Video Segment Anything</span></figcaption></figure><p><strong>As you can see, SAM 2 differs from SAM by the addition of a <u>memory block</u>,</strong> made of a memory attention module, a memory encoder, and a memory bank that stores the past frames, and helps with temporal consistency.</p><p>If you played with <a href="https://sam2.metademolab.com/demo" rel="noopener noreferrer"><strong>the online demo</strong></a>, you will find that the model starts by asking you to click on an object, so it can keep tracking it. So at frame 0, you click the object you want to track, and then the model tracks it on the entire sequence...</p><p><strong>This is called &quot;Promptable&quot; Visual Segmentation.</strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1616" height="614" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1616w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">On frame 1, we click on the dog&apos;s tongue. On the next frame, the tong is tracked consistently. When the model fails, we manually click on it to restart the tracking</span></figcaption></figure><p>This is no different than the original SAM model, and in fact, it&apos;s using the same &quot;prompt encoder&quot;. So let&apos;s the see the details of the model:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="2000" height="701" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Detailed SAM2 graph</span></figcaption></figure><ul><li><strong>Prompt Encoder</strong>: As expected, we begin by clicking objects, which generates a &quot;prompt&quot;, and send this to the same encoder as Segment Anything to track object across each image</li><li><strong>Image Encoder</strong>: We then send the entire vide to the image encoder which is a masked autoencoder</li><li><strong>Memory Attention</strong>: Uses vanilla attention to condition the current frame features on the past frames features and predictions as well as on any new prompts</li><li><strong>Memory Bank: </strong>It retains information about past predictions for the target object in the video by maintaining a FIFO (first in first out) queue of memories of up to N recent frames.</li><li><strong>Mask Decoder (prediction)</strong>: Similar to SAM, but accounting for previous memory information</li></ul><p>So, you saw a second way to build a video segmentation algorithm. The first way was fully transformer based; and this second way has the somewhat robotic &quot;memory bank&quot;; and this because this model is a &quot;hybrid&quot; between 100% video processing, and frame-by-frame processing.</p><h2 id="image-vs-video-segmentation-worth-the-trouble">Image vs Video Segmentation: Worth the trouble?</h2><p>I would say yes, especially considering all the use cases that can benefit video segmentation. For example, <strong><em>surveillance with massive occlusions </em></strong>(in a crowd, with walls, trees, ...) where standard object tracking would be limited, <strong><em>video editing</em></strong>, where for example, we want to remove an object not from one frame, but from an entire scene, <strong><em>sports analytics</em></strong>, entirely based on motion, <strong><em>cell tracking</em></strong> (for example, division of cells, which can only be seen via videos), <em><strong>shoplifting</strong> <strong>detection</strong></em> (which can&apos;t really be seen in an image), <strong><em>fire spreading</em></strong>, and more...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/top-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="294" height="224"><figcaption><span style="white-space: pre-wrap;">Examples of Video Segmentation when image isn&apos;t sufficient</span></figcaption></figure><p>You can see this article for the normal <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer"><strong>image segmentation use cases</strong></a>, and I highly recommend you augment it in your mind with these video examples I provided. So as a rule:</p><ul><li>For most cases, don&apos;t replace all your image segmentation pipelines with video pipelines</li><li>But for the cases where segmentation fails because you need to understand video, do it!</li></ul><p>Alright, we&apos;ve seen a lot, let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>Congratulations on getting so far! Let&apos;s summarize what we learned:</p><ul><li><strong>In many cases, analyzing one event fails</strong>. When video is essential, you have to use Video Computer Vision models.</li><li><strong>Video segmentation is segmentation applied to video processing</strong>, it&apos;s used in in various fields like surveillance, retail, sports analysis, shoplifting detection (or detecting suspicious behavior of any kind) and healthcare.</li><li><strong>Video Segmentation splits into two categories</strong>: Video Object Segmentation and Video Semantic Segmentation.</li><li><strong>Video Object Segmentation (VOS) focuses on tracking defined objects across video frames.</strong> Many applications like SAM2 are semi-supervised, because you give the model a prompt and an initial object to track.</li><li><strong>Video Semantic Segmentation focuses on pixel-level scene segmentation</strong>, it can also be instance or panoptic based, and the output may resemble the one of standard image segmentation.</li><li><strong>Some models like VisTR can be 100% video processing based</strong>. This model uses transformers for video instance segmentation.</li><li><strong>Other models can process frames one by one</strong>, but rely on a memory bank. In the case of SAM2, frames are processes both as a video and one by one (to keep track of a same object)</li></ul><h3 id="next-steps">Next Steps</h3><p>A few articles you can read:</p><ul><li><a href="https://www.thinkautonomous.ai/blog/computer-vision-from-image-to-video-analysis/" rel="noreferrer"><strong>Introduction to Video Processing</strong></a> &#x2014;&#xA0;an old post (you can see my writing style is much different) but good overview on Video Processing.</li><li><a href="https://www.thinkautonomous.ai/blog/object-tracking/" rel="noreferrer"><strong>A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars</strong></a><strong> -</strong> very related to video object processing, but without the segmentation part (bounding boxes).</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about vide computer vision</strong></b>... I have an App full of Computer Vision models and videos. Inside, I&apos;m showing you how to do lane detection, how Waymo&apos;s algorithms work (self-driving cars), and a lot more!<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of advanced Computer Vision content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div>]]></content:encoded></item><item><title><![CDATA[Functional Safety Engineer: The Job that 'certifies' self-driving cars]]></title><description><![CDATA[What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we'll try to understand how to certify a self-driving car code, and make it safe to drive in the streets]]></description><link>https://www.thinkautonomous.ai/blog/functional-safety/</link><guid isPermaLink="false">67a0a0a55b2944097abedb32</guid><category><![CDATA[self-driving cars]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 04 Feb 2025 19:46:04 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars"><p><strong>In 2019, I was an Autonomous Shuttle Engineer, working for a company that got a thrilling opportunity: </strong>to equip Paris&apos; transportation system with our autonomous shuttles. This was a golden opportunity many don&apos;t have, but the client was known to be ruthless selectors. Many others perished while trying to be &quot;approved&quot;.</p><p><strong>With high hopes, our team prepared for the demo day for months.</strong> We meticulously reviewed the client&apos;s 100+-point checklist, ensuring our shuttle met all requirements from real-time operations to autonomy measures. One day, a team of 5 was called to begin process in a secret underground site. It was going to begin.</p><p><strong>The experimentation lasted days, in which each of the items were reviewed. Came the final test: Cyber-Security.</strong> The client made a phone call, and within 30 seconds, an engineer with a thinkpad came and entered the shuttle. &quot;Oh great! We can charge our phones!&quot; He said amused. &quot;What a mistake!&quot;. My colleagues were <u>sweating</u>, horrified at the vision of what this young men could do.... and they were right: In just five minutes, using only a USB stick, he had taken control of the vehicle, got it to drive all across the room. The room went silent, as everyone realized our chance had slipped away.</p><p>Checkmate.</p><p><strong>Many engineers join the self-driving car world for the same reasons I did</strong>: it&apos;s exciting, it&apos;s interesting, it&apos;s a passionating, it&apos;s impacting, it&apos;s just... wow. Yet, nearly all the engineers who are on the &quot;learning&quot; group and have never joined a real self-driving car company yet have absolute zero vision on what it takes to certify a vehicle. We could talk cyber security, but even at the automotive level, the software level, and more...</p><p>So in this post, I will try to sensibilize you to the concept of safety, from an autonomous tech engineer point of view. This means&#xA0;&#x2014; this post won&apos;t be for expert functional safety engineers, but for those who want an introduction.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Speaking of safety, one of the most vital elements in a safety system is <b><strong style="white-space: pre-wrap;">redundancy</strong></b>. This article does not focus on it, but I built a full video breaking down <b><strong style="white-space: pre-wrap;">Mobileye&apos;s redundancy system to achieve functional safety</strong></b>. It&apos;s only for those subscribed to my private emails. <b><strong style="white-space: pre-wrap;">Go </strong></b><a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-mobileyes-true-redundancy-system" target="_blank" rel="noopener noreferrer"><u><b><strong class="underline" style="white-space: pre-wrap;">here</strong></b></u></a><b><strong style="white-space: pre-wrap;"> to get access!</strong></b></div></div><p>Let&apos;s begin with the fundamentals:</p><h2 id="what-is-functional-safety">What is Functional Safety?</h2><p><strong>Functional Safety is about making sure machines and systems stay safe, even if something goes wrong</strong>. For example, in self-driving cars, it means making sure the car can still drive safely if a part stops working. It can mean verifying that an algorithm works under all conditions, but also that it&apos;s never going to crash, and that if it does, the system has a backup.</p><p>To make it work, we use functional safety standards that determine what is safe to include in a self-driving car by evaluating the potential risks associated with each function and scenario. You can therefore understand the entire point of functional safety:</p><p><strong><u>To reduce risk to an acceptable level</u>.</strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1706" height="780" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1706w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The goal of functional safety is to make sure autonomous cars are at an acceptable level of risk</span></figcaption></figure><p>Okay, but this shouldn&apos;t be your job, right? It&apos;s someone else&apos;s problem! So you may wonder...</p><h2 id="why-should-i-bother-learning-about-functional-safety">Why should I bother learning about Functional Safety?</h2><p><strong>Let&apos;s say you decide to build an autonomous tech startup and run your algorithms</strong>. Some are open source, some are designed by you. You decide that these are good algorithms, the accuracy is near perfect, and you&apos;re a brutal C++ coder. There is no way you missed anything. Let&apos;s even pretend you really ARE a super-hero and really, the system is perfect...</p><p><strong>You convinced me... but can you convince recruiters? Or your management? Or the suits giving your startup a self-driving permit? </strong>Hey &#x2014; you can&apos;t test without the permit. No matter how good your system looks, you will need to convince the state to deliver you a permit. It can be the State of California, or the Ministry of Transport, or whoever delivers authorizations.</p><p><strong>The problem?</strong> They are NOT experts in safety or self-driving cars. So they will ask you to go via independent organizations, who run functional safety certification programs. Organisms like <em>T&#xDC;V Rheinland</em> and<em> T&#xDC;V SUD</em> (Germany) are the ones &apos;certifying&apos; you. They&apos;re verifying your safety functions, even the safety critical functions (emergency braking), and doing all kinds of silly tests before issuing you the certification.</p><p>Their job is to verify you are compliant with the industry norms.</p><p>But which norms are we talking about?</p><h2 id="what-are-the-different-functional-safety-norms-used">What are the different functional safety norms used?</h2><p><strong>When we say we want to &quot;reduce risk to an acceptable level&quot;... What is an acceptable level?</strong> Are you the one defining it? If an object detector works at 95%... is this okay? No? Yes? Who defines it? If your blinkers fail once every 300,000 miles... is this fine? Or is it every 3 millions miles?</p><p><strong>You can&apos;t be the deciding entity, this is what norms and industry standards are for. </strong>For example, ISO 26262 is a norm. It&apos;s focusing on <u>electronics</u> (buttons, A/C, windows, sensors, computers, ...), and defines a complete process to develop &amp; test your cars. It also tells you how to test scenarios, how to grade the risk of any event, and how to reduce that risk.<br><br>Let me share some norms we use in the industry:</p><ul><li>&#x2705; <strong>ISO-26262 is the norm that focuses on <u>failures</u> in electronic and software systems.</strong>&#xA0;It&apos;s going to deal with the question &quot;What happens if the object detector crashes mid-drive? Is there any backup?&quot;&#x200B;&#x200B;&#x200B;&#x200B;&#x200B;&#x200B; Based on how your system is implemented, you will comply more or less with the norm.</li><li>&#x2705; <strong>ISO-21448 <u>verifies</u> the&#xA0;Safety of the Intended Function (SOTIF)</strong>. It ensures perception systems like <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer"><strong>LiDAR</strong></a>, cameras, and <a href="https://www.thinkautonomous.ai/blog/faster-rcnn/" rel="noopener noreferrer"><strong>object detection</strong></a> perform safely in all conditions<strong>.</strong>&#xA0;&quot;Is your object detector working on all pedestrians? Really? Even in the dark?&quot;</li><li>&#x2705; <strong>ISO-21434</strong> <strong>is the norm focused on <u>cyber-security</u> of the system</strong>. It solves my USB-stick story. And it tells you everything you need to do to ensure your model is free from cyber attacks.</li><li>&#x2705; <strong>A-SPICE is focused on how your project is <u>coded</u>, tested, and maintained.</strong>&#xA0;This means the requirements, the modular and maintainable code, the coding standards &amp; reviews, the software testing, software versions and revisions, bug fixing, lifecycle of the product, etc...</li><li>&#x2705; <strong>UNECE WP.29 Regulations is the <u>compliance</u> with EU autonomous driving laws</strong>. You need at least this one to be allowed to drive autonomously.</li><li>and more... depending on what you want to certify.</li></ul><p>While these are not mandatory, the more of these norms you check, the safer you&apos;ll look. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about self-driving cars in production</strong></b>... I am doing a full breakdown of Mobileye&apos;s True Redundancy System. Inside, I&apos;m showing you all the different algorithms they test, how their safety guardian fallback works, and discuss their End-To-End algorithm.<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of self-driving car content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div><p>So comes a question:</p><h2 id="how-to-know-if-your-robot-complies-with-functional-safety-norms">How to know if your robot complies with Functional Safety norms?</h2><p>There are TONS of ways to do this, and it&apos;s really a profession, but let me share with you 2 important functional safety concepts:</p><ol><li>The V-Model</li><li>The Functional Safety Process to &quot;certify&quot; a function</li></ol><h3 id="the-v-model">The V-Model</h3><p><strong>The V-Model is a widely used framework in functional safety management and software development</strong>. You will find it when trying to comply with ISO26262, but also A-SPICE for example. It is structured like a &quot;V,&quot; where the left side represents the concepts/requirements/design phase, the bottom part is the coding phase, and the right side corresponds to the validation/integration/testing phase.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/image-3--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1912" height="1286" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/image-3--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/image-3--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/image-3--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/image-3--1-.jpg 1912w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The V-Model is heavily used across all industries in software</span></figcaption></figure><p><strong>You can see it as a continuous process,</strong> where you continuously verify that your system behaves as intended in the concept phase. If not, you rework it. It&apos;s evolving, it&apos;s alive, promoting a systematic approach to achieving functional safety in safety related systems.</p><p>In most companies that seriously want to comply with the ISO norms and get the functional safety accreditation, using the V-Model is the best starting point.</p><p>Next:</p><h3 id="the-functional-safety-process-to-certify-a-function">The Functional Safety Process to &quot;certify&quot; a function</h3><p>As we said, we have ISO26262 focusing on electronics, SOTIF focusing on algorithms, and A-SPICE focusing on code/software. Each of these is using the V-Model. Then, to comply with these norms, you&apos;ll need a &quot;process&quot;. This means defining clearly what each of these phases are.</p><p>Here is a 7-Step process:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1878" height="812" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1878w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The 7 Steps to make a system compliant to ISO norms</span></figcaption></figure><p><strong>The job of a functional safety engineer is to implement this.</strong> This is the &quot;bridge&quot; between systems and production I was telling you about earlier.</p><p>Let me briefly define each element: (credit to a client of Think Autonomous named <a href="https://www.linkedin.com/in/mayur-waghchoure-a5aba5ab/" rel="noopener noreferrer"><strong>Mayur Wagchoure</strong></a> for helping me write this one)</p><h4 id="1-define-the-system"><strong>1. Define the System</strong></h4><p>First, we want to define the system we&apos;re testing. For example, <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer"><strong>lane detection</strong></a>. We want to define the purpose, the scope, the dependencies, and even the normal and edge cases.</p><h4 id="2-hara-hazard-analysis-and-risk-assessment"><strong>2. HARA: Hazard Analysis and Risk Assessment</strong></h4><p>The second point is HARA, in which we want to do:</p><ul><li><strong>HA &#x2014;&#xA0;H</strong>azard <strong>A</strong>nalysis (what could go wrong?)</li><li><strong>RA &#xA0;&#x2014; R</strong>isk <strong>A</strong>ssessment<strong> </strong>(how bad would that be if it went wrong?)</li></ul><p><em>Hazard Analysis</em></p><p>If you want to comply with functional safety standards, the first thing you&apos;ll need to do is account for the different scenarios. I see them into 4 main sections: <strong><em>Car Status</em></strong>, <strong><em>Scenario</em></strong>, <strong><em>Environment</em></strong>,<strong><em> Driving Status.</em></strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/image--2-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="2000" height="976" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/image--2-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/image--2-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/image--2-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/image--2-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Example of all possible environments your car may be in (this may vary based on your testing site)</span></figcaption></figure><p>Your car could be turned on, driving in a country road, with rainy conditions, and driving at low speed. Or you could drive at high speed, and accelerate. Or suddenly brake. Or drive in dry roads. Or wet roads. Putting categories into each of these is a way to avoid the summer/winter rookie mistake.</p><p><em>Risk Assessment</em></p><p>To &quot;grade&quot; each function, you then use the formula defined by ISO26262: <strong>Risk = Severity * Exposure * Controllability.</strong></p><p>For example:</p><ul><li>I am testing the emergency braking function, and the risk that it doesn&apos;t activate (Severity = S3)</li><li>I&apos;m driving in urban environment, at 30-60 km/h, which happens all the time (Exposure = E4)</li><li>Urban areas have many pedestrians, it&apos;s very hard to control (Controllability = C3)</li></ul><p>Then what?</p><p><strong>The ISO26262 provides what&apos;s called the ASIL (Automotive Safety Integrity Level) Table</strong>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.aptiv.com/images/default-source/feature-stories/asil-diagram-v01.png?sfvrsn=d47cbf3e_4" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="2442" height="1537"><figcaption><span style="white-space: pre-wrap;">The ASIL Table &#x2014;&#xA0;This attributes a grading based on your Severity, Exposure, and Controllability. If you have C1, E1, S1, it means you don&apos;t need to go through millions of tests.</span></figcaption></figure><p>I am NOT going to describe how we do in this article, but the &quot;RA&quot; phase is about assigning, for every single function and every single scenario, what&apos;s called an <em>ASIL</em> level. These can be A (safe), B (safe), C (risky), or D (risky). We&apos;re trying to see, for each function, is it risky or safe?</p><p>For example:</p><p>If you&apos;re testing an emergency braking system, in a highway scenario, with wet road, snow, and fog... you can imagine it&apos;s an ASIL-D score. Now if you&apos;re on the same scenario, but testing the radio, it&apos;s probably A or B.</p><h4 id="3-set-safety-goals"><strong>3. Set Safety Goals</strong></h4><p><strong>From every potential hazard and risk we have, we want to turn this into a safety goal.</strong> Basically, turn the failure into an opportunity to design a better system. If I have just one LiDAR, and it&apos;s working bad under snow, could I have a better <a href="https://www.thinkautonomous.ai/blog/lidar-and-camera-sensor-fusion-in-self-driving-cars/" rel="noopener noreferrer">LiDAR and a camera</a> instead?</p><p><strong>Here, we will create a list of requirements for the new system</strong>. It&apos;s still the &quot;concept&quot; phase, where we identify the breaking points, and turn this into a better solution.This is the <u>work</u> where you try to think about reducing risk to an acceptable level.</p><h4 id="4-functional-safety-analysis"><strong>4. Functional Safety Analysis</strong></h4><p>Then, we implement things like <strong>FMEA (Failure Mode and Effects Analysis)</strong>&#xA0;to assess potential failure causes, effects, and mitigation strategies. We can also run <strong>FTA (Fault Tree Analysis)</strong>&#xA0;to explore how faults propagate and lead to hazards. We want to identify all causes of errors.</p><h4 id="5-design-safety-mechanisms"><strong>5. Design Safety Mechanisms</strong></h4><p>Then, we&apos;re introducing mechanisms to detect, isolate, or prevent failures (e.g., redundancy, diagnostics, fail-safe systems). This can be watchdog timers, dual-channel systems, degraded operational modes, ...</p><p><strong>For example, one of the Functional Safety Methods is to implement redundancy.</strong> If you have an ASIL-D component (unsafe); you could turn it into 2 ASIL-B ones (somewhat safe). This way, your overall ASIL score is better, and you become compliant.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1610" height="704" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1610w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A Functional Safety task called ASIL Decomposition used to decrease risk</span></figcaption></figure><p>In this example, we could imagine that the second LiDAR is different, or that the algorithms behind it are more &quot;deterministic&quot;, don&apos;t use AI, and therefore are safer. The goal of functional safety is to try and reduce as many components to ASIL-A and ASIL-B as possible. &gt;&gt;&gt; This is the acceptable level.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">But how do the companies </strong></b><i><b><strong class="italic" style="white-space: pre-wrap;">that actually deploy vehicles</strong></b></i><b><strong style="white-space: pre-wrap;"> solve this?</strong></b> I interviewed LOXO, a Swiss startup deploying fully autonomous delivery robots powered by End-to-End Learning. Interested? <b><strong style="white-space: pre-wrap;">It&apos;s in this </strong></b><a href="https://www.linkedin.com/posts/jeremycohen2626_selfdrivingcars-robotics-deeplearning-activity-7295048405435727872-ULjK/?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAA1gjMgB2UeumB1uFo-it1cN7J4OxYZJIDI" target="_blank" rel="noopener noreferrer"><u><b><strong class="underline" style="white-space: pre-wrap;">post</strong></b></u></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h4 id="6-validation-and-verification"><strong>6. Validation and Verification</strong></h4><p>How do we test? This can be field tests, but also simulations, hardware-in-the-loop (HIL), and fault injection testing. You can also here test the Safety of Intended Functionality (SOtIF) &#x2014; how performant is your algorithm? Is it really THAT good?</p><p>Finally:</p><h4 id="7-iterate-validate-and-document"><strong>7. Iterate, Validate, and Document</strong></h4><p>You want to iterate, improve, and document your safety analysis results. In the end, it&apos;s a very technical job, but that has a lot of paperwork, documentation, diagrams, schematics, grading, because these are the papers giving you authorizations.</p><p>We have now seen:</p><ul><li>What is functional safety?</li><li>What are the different norms we should comply with?</li><li>How do we comply with these norms (overview)</li></ul><p>Let&apos;s see an example:</p><h2 id="example-mobileyes-primary-guardian-fallback-true-redundancy-system">Example: Mobileye&apos;s Primary Guardian Fallback / &quot;True Redundancy&quot; System</h2><p><a href="https://www.thinkautonomous.ai/blog/mobileye-end-to-end/" rel="noopener noreferrer"><strong>Mobileye</strong></a><strong>, Intel&apos;s self-driving car company, is has a very strong functional safety focus</strong>. Their algorithm has 3 distinct channels that are completely different:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1892" height="752" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1892w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Mobileye&apos;s True Redundancy System (</span><a href="https://www.thinkautonomous.ai/sdc-app" rel="noreferrer"><span style="white-space: pre-wrap;">you can learn more by looking the full video in my app &#x2014; available when you join my daily emails</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>The lane detection is the <u>main</u> channel used to fine lane lines</strong>. This can work for example with <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer">modular deep lane detection</a>. It is <strong><u>verified</u></strong> with <a href="https://www.thinkautonomous.ai/blog/robot-mapping/" rel="noopener noreferrer"><strong>HD Map</strong></a> Extraction &amp; Localization. If they agree, then we&apos;re good, but if they don&apos;t, they&apos;ll extract the lanes from a parallel <a href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/" rel="noopener noreferrer"><strong>end-to-end deep learning</strong></a> algorithm that will act as the &quot;judge&quot; or guardian.</p><p><strong>Do you realize how many algorithms are running in parallel? </strong>They implemented these automatic protection functions in case of failure. They also implemented these safety requirements across the entire system, meaning the electronic systems, the software components, and so on...</p><p><strong>When doing something like this, it&apos;s very important that each function is run using a separate method</strong>, possibly with a separate computer, separate sensors, etc... so that there cannot be a single point of failure (for example, if everything uses the same camera, and this one fails, it&apos;s not functionally safe).</p><h2 id="wait-does-everybody-really-do-all-of-this">Wait... Does everybody really do all of this?</h2><p>No.</p><p><strong>In fact, many startups don&apos;t have a functional safety team</strong>, <strong>or even have a safety system in place</strong>. In this case, they try to do it in the safety critical systems, while waiting for the certification process. Some are also in a more favorable state/country that gives permits more easily (to enhance innovation and startups work on the technology).</p><p><strong>It&apos;s important to understand that complying with ISO norms is NOT mandatory</strong>. In the European Union, you need to comply with the UNECE WP.29 Regulations (traffic laws) but I don&apos;t think the ISO norms are mandatory.</p><p><strong>In fact, Tesla doesn&apos;t comply with the norms, and they are approved to drive in the streets</strong>. They sell cars, and they even sell autonomous cars all across the world. But you&apos;ll note that some of their functions, like FSD (Full Self-Driving) are currently (early 2025) NOT authorized everywhere, like in Europe, because they don&apos;t comply with all the norms.</p><p>Okay, okay, I think we have ENOUGH! Let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Functional safety makes sure robots and algorithms operate safely</strong>, even when something goes wrong, by reducing risks to an acceptable level.</li><li><strong>Every engineer working in the field should be introduced to safety. </strong>This defines how you code, but also whether a startup gets authorzations to drive or not.</li><li><strong>Key functional safety norms include ISO 26262</strong> for electronics, <strong>ISO 21448</strong> for algorithms, <strong>ISO</strong> <strong>21434</strong> for cybersecurity, and <strong>UNECE WP.29 </strong>for EU compliance.</li><li><strong>The V-Model is a structured approach in functional safety management,</strong> covering concept, coding, and validation phases to achieve compliance. It has a V shape doing Conception - Coding - Testing.</li><li><strong>Functional Safety is a 7-step process includes defining systems</strong>, hazard analysis, setting safety goals, and implementing safety mechanisms to ensure compliance.</li><li><strong>The ISO26262 norm defines risks as <em>Exposure</em> <em>* Severity </em>* <em>Controllability</em><em>.</em></strong> An ASIL table then defines for each function, which grade it has.</li><li><strong>When something is risky (ASIL-C, ASIL-D),</strong> we introduce redundancy, diagnostics, and fail-safe systems to detect, isolate, or prevent failures, enhancing the overall safety integrity level.</li><li><strong>We want to test through simulations</strong>, field tests, and fault injection to ensure safety functions perform under all conditions, meeting the required safety standards.</li></ul><p>Alright, I think we are good!</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about self-driving cars in production</strong></b>... I am doing a full breakdown of Mobileye&apos;s True Redundancy System. Inside, I&apos;m showing you all the different algorithms they test, how their safety guardian fallback works, and discuss their End-To-End algorithm.<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of self-driving car content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div>]]></content:encoded></item></channel></rss>