<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Read from the most advanced autonomous tech blog]]></title><description><![CDATA[Step 1 — Join 10,000+ Cutting-Edge Engineers who receive my Private daily Emails on Self-Driving Cars, Computer Vision, and Deep Learning.]]></description><link>https://www.thinkautonomous.ai/blog/</link><image><url>https://www.thinkautonomous.ai/blog/favicon.png</url><title>Read from the most advanced autonomous tech blog</title><link>https://www.thinkautonomous.ai/blog/</link></image><generator>Ghost 5.85</generator><lastBuildDate>Sun, 08 Mar 2026 11:10:42 GMT</lastBuildDate><atom:link href="https://www.thinkautonomous.ai/blog/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Perciv AI: The Power of RADAR Deep Learning with Andras Palffy]]></title><description><![CDATA[Perciv AI is building Deep Learning for RADAR algorithms. We could call this 4D/3D Deep Learning. I have recently visited their HQ, and in this post, I'm revealing what I learned...]]></description><link>https://www.thinkautonomous.ai/blog/perciv-ai/</link><guid isPermaLink="false">699439e379f2601e412fb625</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 17 Feb 2026 11:43:43 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/02/perciv-ai-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/perciv-ai-1.jpg" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy"><p><strong>Ever done a &quot;house swap&quot;?</strong> Recently, one of my mentors in Canada told me he was swapping homes with someone in the Netherlands. Sounds unreal... Yet it isn&#x2019;t. Platforms like Home Exchange apparently have 100,000+ members doing exactly this.</p><p><strong>House swapping is one of those things that could never have worked a decade ago</strong>. Not because the idea was bad (I think it is, but that&apos;s different), but because trust, norms, and infrastructure weren&#x2019;t there.</p><p>And RADAR Deep Learning follows the same pattern.</p><p><strong>RADAR has existed for over 100 years.</strong> Most RADAR algorithmic is still traditional signal processing. As a result, RADAR engineers have long been a small, almost outcast group of &quot;freaks&quot; (sorry) working on systems few people truly understood.</p><p><strong>Why? Because for decades, RADARs were treated as a secondary sensor</strong>. Too noisy. Too low-resolution. Useful only as an auxiliary input in sensor fusion, under the assumption that <em>even noisy measurements are better than nothing</em>.</p><p>That assumption is now breaking.</p><p><strong>RADARs are moving into a primary sensor role</strong>:</p><ul><li>high-resolution RADARs exist</li><li>imaging 4D RADARs are spreading (<a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer">see my article here</a>)</li><li>And more importantly, DEEP LEARNING is now so capable that processing even noisy point clouds can be done!</li></ul><p><strong>This is why in this episode, I am boarding a train to Rotterdam, </strong>where I am meeting with Andras Palffy from <a href="https://www.perciv.ai" rel="noreferrer">Perciv</a>, a startup focused on RADAR Deep Learning.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F935;&#x200D;&#x2642;&#xFE0F;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Andras Who?</strong></b><br>The name is Palffy. Andras Palffy. This machine perception and AI specialist co-founded <b><strong style="white-space: pre-wrap;">Perciv</strong></b>, a Rotterdam based startup focused on AI for RADARs. He wrote multiple 3D Deep Learning papers, and got his Ph.D at the TU Delft (Netherlands).</div></div><p>He&apos;s today running Perciv, and I&apos;m going to show you an amazing video of his work...</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
  <a class="yt-thumb" data-src="SKMIrKBd7sY" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=SKMIrKBd7sY">
  <img src="https://i.ytimg.com/vi/SKMIrKBd7sY/hqdefault.jpg" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy">
  <span class="yt-play" aria-hidden="true"></span>
  </a>
</div>
<!--kg-card-end: html-->
<p>WOW!!! So cool, isn&apos;t it? Now, in this post, I will cover 2 ideas to explore:</p><ol><li>The <strong>process</strong> of Deep Learning for RADARs (how does it work)</li><li>The <strong>applications</strong> you can do when leveraging 4D Deep Learning</li></ol><p>Let&apos;s begin with the process:</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div><h2 id="how-to-make-deep-learning-for-radar-work">How to make Deep Learning for RADAR work</h2><p>Let&apos;s begin with this post showing you a demo of Perciv AI&apos;s algorithm:</p>
<!--kg-card-begin: html-->
<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:ugcPost:7374749794465968129?collapsed=1" height="770" width="504" frameborder="0" allowfullscreen title="Embedded post"></iframe>
<!--kg-card-end: html-->
<p><strong>Can you feel the power? </strong>This video shows object detection, but what&apos;s very interesting is how <em>noisy</em> the input is! The points are &quot;dancing&quot;, unlike most <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noreferrer">LiDAR point clouds</a>, which are much more robust and accurate.</p><p>Yet, RADARs provide direct velocity estimation, via the <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noreferrer">Doppler Effect</a>, making them very interesting sensors to use.</p><p>So how does it work? It&apos;s really 3 steps:</p><ol><li>A RADAR outputs a&#xA0;<u>raw&#xA0;signal</u>.</li><li>This signal is often converted to a 2D or&#xA0;3D&#xA0;<u>point cloud</u>&#xA0;to be processed.</li><li>3D&#xA0;Deep Learning&#xA0;algorithms&#xA0;are working on the point clouds with <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noreferrer">points or voxel approaches</a>, just like for LiDARs.</li></ol><p>Now the interesting element:</p><p><strong>Most traditional RADAR algorithms skip step 2</strong>, because they process the RADAR signal directly (you can see how <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noreferrer">in this article</a>). In the case of Deep Learning, we have the option to either convert to a point cloud OR process the raw signal directly. This means that step 2 (signal &#x2192; point cloud conversion) can be skipped, which avoids losing data during conversion.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg" class="kg-image" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy" width="2000" height="466" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/02/Screenshot-2026-02-17-at-11.40.56--1-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Process from RADAR signal to output</span></figcaption></figure><p><strong>We now get the general idea:</strong> Thanks to Deep Learning, we can make noisy RADAR data useful. The next question is, what exactly can we do?</p><h2 id="applications-of-deep-learning-for-radars-by-perciv">Applications of Deep Learning for RADARs (By Perciv)</h2><p>Here is a 30 second clip I recorded at Perciv going in-depth of the <strong>sensors</strong>, <strong>algorithms</strong>, and <strong>end</strong>-<strong>user</strong> interface.</p><figure class="kg-card kg-video-card kg-width-regular kg-card-hascaption" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/02/11-Panels-Music_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">0:26</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            <figcaption><p><span style="white-space: pre-wrap;">What&apos;s possible using Deep RADARs</span></p></figcaption>
        </figure><p><strong>Let&apos;s begin with the sensors</strong>. Did you count how many there were? I see 1 camera, 2 LiDARs, and one RADAR that has 2 views: <u>a point cloud view</u>, and a <u>range-doppler view</u>. If you zoom in, you&apos;ll see that the RADAR point clouds are absolutely chaotic. There is no way you&apos;d make sense of it. </p><p><strong>And yet, when you see the blue part, in the middle of the video, you see what the Deep RADAR algorithms are capable of</strong>. The algorithmic panel is ALL based on the RADAR input only. And notice how awesome they are, we have:</p><ul><li>LiDAR + RADAR Accumulator</li><li>RADAR Heatmap</li><li>Freespace Detection</li><li>3D/4D Object Detection and Perception</li></ul><p>Seriously...</p><blockquote class="kg-blockquote-alt">A freespace detector... on a RADAR!</blockquote><p>This is really impressive, isn&apos;t it? And it&apos;s not ALL, because later on, Perciv AI showed me a side-by-side comparison of SLAM with RADAR and LiDARs. Can you guess which one was superior? </p><p>Here&apos;s the answer:</p><p>While the RADAR Odometry uses the velocity information and can accurately spot moving points, LiDAR doesn&apos;t, and as a result, overshoots!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg" class="kg-image" alt="Perciv AI: The Power of RADAR Deep Learning with Andras Palffy" loading="lazy" width="1800" height="919" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/02/embeddable_377f0e2d-e9f9-418f-8fa1-d82c2d5fa822.jpg 1800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">RADAR vs LiDAR Odometry &#x2014;&#xA0;RADAR direct speed provides a superior accuracy</span></figcaption></figure><p>This is a very good example of how Deep Learning for RADAR can be used for advanced applications.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Interested in how it works? Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div><h2 id="summary">Summary</h2><ul><li><strong>Perciv AI builds Deep Learning for RADAR algorithms and they are awesome</strong>. I&apos;ve been following Perciv since 2023, even interviewed them when they were only 3, and their dedication to this field is unmatched.</li><li><strong>In RADAR processing, you can either process raw signal, or convert it to a point cloud</strong> the same way you&apos;d do with LiDARs. A heavier pre-processing step is usually done to reduce noise.</li><li><strong>The RADAR processing pipeline therefore becomes:</strong> signal &#x2192; point cloud &#x2192; 3D Deep Learning algorithms &#x2192; output</li><li><strong>There are many algorithms you can run on RADARs</strong>, from object detection to SLAM. In some cases, RADAR&apos;s velocity information can even provide BETTER results than LiDARs.</li></ul><h2 id="infiltrate-perciv-ai-with-me">Infiltrate Perciv AI with me?</h2><p>The last time I visited Perciv AI, I got a complete tour of their facility, team, 4D Deep RADAR algorithms, and even self-driving car. I got to live as an intern on his first day of a self-driving car startup. </p><p><strong>I&apos;m thinking...Wanna see what it&apos;s like? </strong>I mean, what I&apos;ll record there will obviously be top secret, guarded and accessible ONLY to the Edgeneer&apos;s Land citizens (my community membership)....BUT the show?</p><p><strong>This is a show they just did at IAAA Munich to everybod</strong>y. And I see no reason why everybody shouldn&apos;t discover it. This is why I&apos;m creating a special 2-day Virtual Tour,&#xA0;in which you&apos;ll be able to come with me in Rotterdam, be a fly on the wall, and get to live your first day as a self-driving car intern...You will see things like:</p><ul><li>&#x2705; Their self-driving car&#xA0;&#x2014; if you never saw a self-driving car before, this will be the closest you&apos;ll ever get, we&apos;ll see the sensors, wires, everything </li><li>&#x2705; Their 4D Deep RADAR demo&#xA0;&#x2014; where they will demo their algorithms on me! </li><li>&#x2705; Their RADAR tour &#x2014;&#xA0;where they&apos;ll show you what is a RADAR, and give you a tour of the different types in the market</li><li>&#x2705; The RADAR vs LiDAR SLAM video &#x2014; explaining the differences in Odometry estimation and how to do a clean one using RADARs</li></ul><p>As I said, this is the public stuff you normally CAN&apos;T see unless you physically move to where they are. For 99% of people reading this, this is a unique chance to see it. Interested?</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F3AB;</div><div class="kg-callout-text">Grab your Ticket for the Perciv AI Discovery Tour: <a href="https://www.thinkautonomous.ai/perciv-ai">https://www.thinkautonomous.ai/perciv-ai</a></div></div>]]></content:encoded></item><item><title><![CDATA[How the Solid-State LiDAR works (and why everyone bets on it)]]></title><description><![CDATA[The LiDAR industry is changing. The 100k$ mechanical LiDAR is gone; and we currently see incredible a solid-state LiDAR mass-produced for 1,000$ or less. How do these new-gen LiDARs work?]]></description><link>https://www.thinkautonomous.ai/blog/solid-state-lidar/</link><guid isPermaLink="false">697a327cd1ce7c5171ff3592</guid><category><![CDATA[lidar]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 28 Jan 2026 16:59:40 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/01/solid-state-lidar.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/solid-state-lidar.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)"><p><strong>In 1607, the Jamestown colony was in a critical situation</strong>. English settlers founded it and declared it their first permanent colony in North America. They arrived with total confidence: they knew how to build a town. So they built wooden houses, palisades, shallow foundations, just the English way. But there was a problem: Jamestown was built on a swamp.</p><p><strong>Within weeks, houses collapsed, mosquitos propagated malaria</strong>, <strong>and the water they were drinking caused fever and poisoning</strong>. Within months, half of the settlers died. Yet, the remaining didn&apos;t figure out a better plan, and too much was already decided. It&apos;s only after enduring famine, diseases, and war with locals that they found the right approach, the one that turned Jamestown into the first american colony.</p><p><strong>Solid-state LiDAR are that final method</strong>. In the LiDAR industry, many have experimented with all sorts of sensors, until mutually agreeing on an &quot;ideal&quot; solution: the solid-state LiDAR. Not only it could reduce cost, but it could also significantly improve the performances.</p><p><strong>In this article, I am going to explain to you what is a solid-state LiDAR</strong>, how do they work, and more importantly, why they&apos;re a better choice than most of the other sensors. To truly understand solid-state, we&apos;ll need to also understand mechanical LiDARs, and all their moving parts.</p><p>This will be our first point...</p><h2 id="the-components-of-a-lidar-sensor">The Components of a LiDAR sensor</h2><p>If you want to understand mechanical and solid-state LiDARs, you&apos;ll first need to see the internal components of a LiDAR. Then, we&apos;ll figure out how to classify a solid-state LiDAR based on how these parts move.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1589" height="1258" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/04466d8e-5dbc-4bc5-9f00-6e1804415cae--1-.jpg 1589w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The different components that can exist in a LiDAR</span></figcaption></figure><p>I am NOT going to describe these one by one, because I would like to instead show you how they all work together. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">This article shows a classification by scanning system. I have <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noreferrer">a complete article breaking down all the different types of LiDARs here</a>. </div></div><p>Keep these in mind, and let&apos;s take a look at...</p><h2 id="from-mechanical-to-solid-state-lidar">From Mechanical to Solid-State LiDAR</h2><h3 id="the-mechanical-360%C2%B0-lidar">The Mechanical 360&#xB0; LiDAR</h3><p><strong>Back in 2017, I took my first LiDAR class.</strong> It was featuring a Velodyne 64, which is a mechanical LiDAR (Light Detection And Ranging) that became the most famous LiDAR in the autonomous vehicle industry. At this time, it was costing over 100,000$, and promised to transform several use cases (indoor, outdoor robotics, SLAM, ...).</p><p>The principle of this LiDAR is simple; multiple lasers are stacked vertically on mechanical rotating components that spin really fast.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/640-4-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="550" height="309"><figcaption><span style="white-space: pre-wrap;">Fantastic animation from Hesai LiDARs (</span><a href="https://www.thinkautonomous.ai/blog/loxo/" rel="noreferrer"><span style="white-space: pre-wrap;">source, recommended</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>From here, you start identifying the advantages</strong> (accuracy, 360&#xB0;), but also the drawbacks: it&apos;s terribly <u>costly</u> (100k or so in 2017), and better 3D requires more channels - <u>hence more lasers</u> (bigger sensors).</p><p>This is how we started introducing the second types...</p><h3 id="the-mechanical-mirror-lidars">The Mechanical Mirror LiDARs</h3><p><strong>In this evolution, we no longer rotate the entire sensor, nor use multiple laser pulses, but instead, use mirrors and polygons. </strong>Here is an animation explaining how the next 2 work, that I found in <a href="https://www.youtube.com/watch?v=3EehCU3csJQ" rel="noopener noreferrer">this fantastic video again from Hesai</a>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.08.46-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Left: A single laser is sent to a mirror which sends it to a polygon. Right: Several lasers are sent to a 1D mirror.</span></figcaption></figure><ul><li><strong>1D Rotating Mirror</strong>: <strong>The first alternative could be a single mirror that deflects the laser.</strong> Think about it, this is genius! We can use a mirror that spins horizontally to recreate that 3D shape. Of course, we&apos;d need multiple lasers stacked, but we fix the problem of having a rotating platform, which can break.</li><li><strong>Polygon-Mirror: Another alternative is to use ONE laser, and deflect it via the use of mirrors and polygons</strong>. In this case, the mirror swings vertically, and the polygon spins horizontally. This creates a 3D representation, which is narrower, can&apos;t spin 360&#xB0;, but produces a functional point cloud.</li></ul><p>These two are great, but still require you to use polygons and mirrors. In a way, it&apos;s still mechanical. So let&apos;s now talk about the true definition of solid-state...</p><h3 id="solid-state-lidars-no-moving-parts">Solid-State LiDARs = &quot;No Moving Parts&quot;</h3><p>The first time I learned about it was around 2021 when a company asked me to help them choose between multiple LiDARs. At the time, solid-state technology was emerging, and many were saying it was the future of self-driving cars. The definition was repeated by everyone everywhere;</p><blockquote class="kg-blockquote-alt"><strong>&quot;No Moving Parts&quot;</strong></blockquote><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1484" height="754" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/aa115a41-3c08-4c82-bf89-dc052687b95a--1-.jpeg 1484w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The purest definition of a solid-state LiDAR is that it has no moving part</span></figcaption></figure><p>Huh. What&apos;s so problematic with moving parts? Is that so terrible? Well, yes, because when used all day for weeks and weeks, these parts will simply... break!</p><p>If we compare solid-state to mechanical LiDARs, we can also see that in 100% of the cases, solid-state is a directional sensor. This means you cannot use it on the roof of your car; <u>you have to orient it very strategically, and you must use several of these sensors if you want a 360&#xB0; view</u>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1588" height="692" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/34516df1-b33d-4b2d-a6c2-829374a54e46--1-.jpeg 1588w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">By definition, solid-state LiDARs are directional and can&apos;t rotate to achieve 360&#xB0;</span></figcaption></figure><p>Now, let&apos;s try to understand the differences, and how we can get a 3D point cloud without moving lasers.</p><p>For this, I&apos;ll use the matrix below, which shows the different types of LiDARs based on the components moving. (realize you already covered the first 3 dark rows).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="2000" height="1101" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/a388303b-6fca-43c0-8976-e12fa2448d83--1-.jpg 2229w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The second part of the matrix: Solid-State is defined by what moves, and how.</span></figcaption></figure><p>Let&apos;s see these, one by one:</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Our 3D Deep Learning Course opens on February 5, 2026</strong></b>. Join the Waitlist now and instantly receive x3 Deep Learning Goodies; an article on voxels vs point based approaches; a 3D Segmentation Map; and a 3D Deep Learning Engineer Survey. <a href="https://www.thinkautonomous.ai/deep-point-clouds-waitlist" rel="noreferrer"><b><strong style="white-space: pre-wrap;">Get Access here</strong></b></a>.</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/deep-point-clouds-waitlist"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Deep Point Clouds Waitlist</div><div class="kg-bookmark-description">Deep Point Clouds Waitlist</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://statics.myclickfunnels.com/workspace/jkOnBQ/image/17505373/file/d75ff10b6ec59cdd182050243b59b7e8.png" alt="How the Solid-State LiDAR works (and why everyone bets on it)"></div></div><div class="kg-bookmark-thumbnail"><img src="https://statics.myclickfunnels.com/workspace/jkOnBQ/image/17945016/file/d70c4de9c26e10f34ad8513bd4e1166c.jpeg" alt="How the Solid-State LiDAR works (and why everyone bets on it)"></div></a></figure><h4 id="mems-micro-electromechanical-system"><strong>MEMS (Micro-electromechanical system)</strong></h4><p><strong>In a MEMS LiDAR, you&apos;re projecting one laser to a MEMS mirror that oscillates both horizontally and vertically.</strong> It mimics the LiDAR + mirror rotation, but it&apos;s now an oscillation at the micro level.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at15.28.44-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">You can learn more about this on </span><a href="https://www.youtube.com/watch?v=g7gHm-38t_s" target="_blank" rel="noopener noreferrer"><span style="white-space: pre-wrap;">the Fraunhofer IPMS video where this animation is from</span></a><span style="white-space: pre-wrap;">.</span></figcaption></figure><p><strong>MEMS mirrors still move, so MEMS LiDARs are not &quot;true&quot; solid-state</strong>. Yet, they are excellent alternatives to the mirrors, more resistant to vibrations, and shocks. When looking in more details, LiDAR makes either use a 2D MEMS mirror, or two 1D MEMS Mirror, oscillating horizontally and vertically.</p><h4 id="opa-optical-phased-array"><strong>OPA (Optical Phased Array)</strong></h4><p><strong>What is a LiDAR?</strong> It&apos;s a device that sends a <u>light wave</u>. Correct? Well, a light wave is a... wave. Yes? And a wave is something we understand. It has an amplitude, a phase, a frequency, and a wavelength! In an OPA LiDAR, we use a <u>phase shifter</u> to electronically steer the light wave. This sounds crazy, but it works. This is really modern, new generation, and a &quot;true&quot; solid-state system, since no part is moving.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="640" height="283" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/ScreenRecording2026-01-28at14.30.52-ezgif.com-optimize.gif 640w"><figcaption><span style="white-space: pre-wrap;">OPA LiDAR (</span><a href="https://www.youtube.com/watch?v=xEqV879qDNE" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="flash-lidars"><strong>Flash</strong> <strong>LiDARs</strong></h4><p><strong>In a Flash LiDAR, a diffuser projects a wide, diffused laser illumination which comes back to an array detector,</strong> creating a full 3D image in a single exposure. <u>This is a non-scanning technology; everything is illuminated at once</u>.</p><p>Was that clear? Well, imagine being in the dark, and trying to illuminate the room.</p><ul><li>You can either agitate a red laser all over the place (scanning devices - MEMS, OPA, ...)</li><li>Or you can use a torch, which instantly illuminates the room.</li></ul><p><strong>This is what a Flash LiDAR does</strong>,<strong> it&apos;s a laser torch.</strong></p><h4 id="solid-state-summary">Solid-State Summary</h4><p>Cool, a quick summary of the last 3?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1790" height="654" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/f5a11054-46b0-4260-853a-c10349daf147--1-.jpeg 1790w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The different types of solid-state LiDARs</span></figcaption></figure><p>We now have a good understanding of Solid-State. The question I want to continue with is...</p><h2 id="how-is-solid-state-better-than-mechanical-lidar-technology">How is Solid-State better than Mechanical LiDAR technology?</h2><p>There are several aspects that you can already guess, but I&apos;d like to take these one by one anyway.</p><h3 id="better-durability-no-moving-parts"><strong>Better durability (no moving parts)</strong></h3><p><strong>Mechanical LiDARs <u>have moving parts</u></strong>, which wear out over time and increase the risk of failure in automotive environments (vibration, heat, dust). This risk is real for MEMS (which we saw is partly mechanical), but completely reduced for OPAs and Flash LiDARs. <u>The #1 advantage of using a solid-state LiDAR is this.</u></p><h3 id="compact-lightweight-design">Compact &amp; lightweight Design</h3><p><strong>A mechanical LiDAR HAS to be on the roof of a vehicle. </strong>This is not only ugly, but also impractical. On the other hand, a solid-state LiDAR can be nicely integrated in the front of a vehicle. This makes Mechanical LiDAR not such a good option. When you look at the ADAS (Advanced Driver Assistance System) industry, most companies like BMW, Mercedes-Benz, etc... include MEMS LiDARs in the front. Its small size makes it ideal for integration into space-constrained platforms like drones and autonomous vehicles.</p><p>Let&apos;s continue:</p><h3 id="mass-production-capability">Mass Production Capability</h3><p><strong>Manufactured using semiconductor processes</strong>, solid-state LiDARs can be mass produced with lower costs. MEMS are currently the cheapest, but OPAs promise to reach incredible costs (100$ or less). The math makes sense, we got lower size and lower cost, which is always the direction we want to go towards in hardware.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/IDTechEx_Lidar_chart.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="500" height="317"><figcaption><span style="white-space: pre-wrap;">The cost of LiDAR based on their types </span><a href="https://www.idtechex.com/en/research-report/lidar-2024-2034/995" rel="noreferrer"><span style="white-space: pre-wrap;">(source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h3 id="point-cloud-resolution-high-performance">Point Cloud Resolution &amp; High Performance</h3><p><strong>A mechanical LiDAR solution based on spinning mechanics often provides sparser point clouds</strong>, especially vertically, with gaps in coverage compared to dense sensors like cameras. This can lead to blind spots for low or small obstacles. On the other hand, a solid-state LiDAR can capture hundreds of thousands of points per second, and has a higher angular resolution, which is very good for tasks like 3D mapping or obstacle detection.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1024" height="342" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/c8bb9ea792d69ebb06c349da85d46b15.jpg 1024w" sizes="(min-width: 720px) 720px"></figure><ul><li>With this, a solid-state LiDAR has lower power consumption (good when using drones for example), could resist environmental conditions better, scan faster, and have a flexible field of view.</li><li>Other than the field of view, the modulation itself is very much manageable; most FMCW (frequency modulated continuous wave) LiDARs are for example based on Solid-State, and NOT mechanical.</li></ul><p><strong>In industries like self-driving cars,</strong> smart cities, industrial automation, robotics, using something with high resolution, high accuracy, good enough distance/range, and potentially a wide field of view makes total sense.</p><h2 id="range-resolution-performance">Range, Resolution, Performance?</h2><p>The following is to take with a pinch of salt, because it varies very often and some companies have crazy claims. Yet, I also looked at studies like <a href="https://www.idtechex.com/en/research-report/lidar-2024-2034/995" rel="noopener noreferrer">this one from IDtechEx</a>, <a href="https://www.mdpi.com/2072-666X/11/5/456" rel="noopener noreferrer">this one on MEMS mirrors</a><strong> </strong>, and <a href="https://onlinelibrary.wiley.com/doi/full/10.1002/lpor.202100511" rel="noopener noreferrer">that one on OPAs</a>. Here is an overview:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg" class="kg-image" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy" width="1884" height="964" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/73c6419a-b20f-42fc-bced-dbd772e89eb3--1-.jpeg 1884w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Comparing the different types of sensors based on range, field of view, cost, and resolution. It&apos;s highly incomplete, but gives you an idea.</span></figcaption></figure><p>Can you see why MEMS, which even though is not really solid-state is the BEST compromise? It&apos;s the only one that can currently be mass-produced at a low price, while keeping good range and high resolution.</p><p><strong>You can therefore see how MEMS and Mechanical LiDARs are still the ones being used the most in the industry. </strong>True solid-state is a crazy dream, with incredible claims (an OPA LiDAR could reach a cost of 100$). For now, we aren&apos;t there yet.</p><h2 id="example-1-innoviz-technologies">Example 1: Innoviz Technologies</h2><p>At CES 2026, I have explored solid-state LiDARs with Seyond &amp; Innoviz. On the one hand, Seyond that you already saw, is doing Flash LiDARs, which is &quot;true&quot; solid-state. On the other, Innoviz is very likely doing MEMS, which is... hybrid (still following?).</p><p>I would like to start with Innoviz Technologies latest demo:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="JF8rhmANxJM" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=JF8rhmANxJM">
    <img src="https://i.ytimg.com/vi/JF8rhmANxJM/hqdefault.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>Did you see how awesome that looks? Now, you can notice how the benefits here are related to cost reduction, to size shrinking, and heat/power reduction. On the other hand, let&apos;s now see a demo of a Flash LiDAR:</p><h2 id="example-2-seyond-flash-lidars">Example 2: Seyond Flash LiDARs</h2><p>Here is now the second example, where <a href="https://www.seyond.com/" rel="noreferrer">Seyond</a> gives you an amazing overview of a Flash LiDAR (Hummingbird). This video is originally from my membership The Edgeneer&apos;s Land - make sure to <strong>be in my daily emails to learn more</strong>.</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="-71Cb5V3nfI" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=-71Cb5V3nfI">
    <img src="https://i.ytimg.com/vi/-71Cb5V3nfI/hqdefault.jpg" alt="How the Solid-State LiDAR works (and why everyone bets on it)" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>Alright, let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>Here is a bullet point summary of the article:</p><ul><li><strong>The robotics &amp; LiDAR industry tends to use 2 types of LiDARs</strong>: Mechanical and Solid-state. While the former has moving parts, the later doesn&apos;t.</li><li><strong>Solid-State LiDARs come in 3 categories: </strong>MEMS (with moving mirrors), OPA (true solid-state with no moving parts), and Flash LiDAR (projects laser arrays for instantaneous scene capture). They are all directional, lower power, higher resolution, but shorter range and lower reliability than those with mechanical movement.</li><li><strong>LiDAR technology is about sending a laser</strong> to the world and measuring the time a wave takes to hit a surface and come back. Yet, this can be done via several processes.</li><li><strong>The semiconductor manufacturing process allows solid-state LiDAR to be mass-produced at lower cost,</strong> making it more accessible for automotive and industrial applications.</li><li><strong>Solid-state LiDAR technology is advancing rapidly and is becoming the default choice</strong> for applications requiring high performance, compactness, and reliability, including self-driving cars and smart cities.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Our 3D Deep Learning Course opens on February 5, 2026</strong></b>. Join the Waitlist now and instantly receive x3 Deep Learning Goodies; an article on voxels vs point based approaches; a 3D Segmentation Map; and a 3D Deep Learning Engineer Survey. <a href="https://www.thinkautonomous.ai/deep-point-clouds-waitlist" rel="noreferrer"><b><strong style="white-space: pre-wrap;">Get Access here</strong></b></a>.</div></div><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/deep-point-clouds-waitlist"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Deep Point Clouds Waitlist</div><div class="kg-bookmark-description">Deep Point Clouds Waitlist</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://statics.myclickfunnels.com/workspace/jkOnBQ/image/17505373/file/d75ff10b6ec59cdd182050243b59b7e8.png" alt="How the Solid-State LiDAR works (and why everyone bets on it)"></div></div><div class="kg-bookmark-thumbnail"><img src="https://statics.myclickfunnels.com/workspace/jkOnBQ/image/17945016/file/d70c4de9c26e10f34ad8513bd4e1166c.jpeg" alt="How the Solid-State LiDAR works (and why everyone bets on it)"></div></a></figure>]]></content:encoded></item><item><title><![CDATA[LOXO: How to certify End-To-End algorithms in production with Jonathan Péclat]]></title><description><![CDATA[How do you make end-to-end deep learning algorithms certified in production? When you have no way to grade each block individually? Jonathan Péclat from Loxo explains that to us.]]></description><link>https://www.thinkautonomous.ai/blog/loxo/</link><guid isPermaLink="false">62e120112ee42fb76dbfe4e2</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 20 Jan 2026 10:43:40 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo.jpg" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><p><strong>On June 4, 1996, the Ariane 5 rocket was ready to be launched after years of work</strong>, public funding, and political pressure. The stress was at maximal, but after just 40 seconds, the rocket exploded, causing a loss of over 370M$. <strong> </strong></p><p><strong>This event is one of the most known in software engineering</strong>, and in particular because of the reasons of the crash:&#xA0;<u>a float to int conversion</u>.<strong> </strong>Indeed, the engineers reused the code from Ariane 4 to launch Ariane 5, but forgot that a&#xA0;<em>float64</em>&#xA0;storing the horizontal velocity would be converted converted to a signed&#xA0;<em>int16</em>. 40 seconds into launch, the conversion failed and <strong><em>crashed</em></strong> the rocket.</p><p><strong>I think this story can be a perfect introduction to the domain of autonomous vehicle safety;</strong> which we&apos;ll cover today with our guest Jonathan P&#xE9;clat form Loxo.</p><p>A quick intro:</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><a href="https://www.linkedin.com/in/jonathan-p%C3%A9clat-40bb678a/" target="_blank" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">Jonathan P&#xE9;clat</strong></b></a> is the Head of Software Architecture at <a href="https://www.loxo.ch/en/" target="_blank" rel="noopener noreferrer">LOXO</a>. He provided me with fantastic insights on their redundancy approach to make vehicles compliant while using cutting-edge algorithms like End-To-End Deep Planners.</div></div><p><a href="https://www.loxo.ch/en/" rel="noreferrer">Loxo</a> is a Swiss based company started in 2022 where they built a first prototype for an autonomous shuttle. Since then, it evolved into this vehicle that now operates in Germany &amp; Switzerland.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/54ceae5c-58bd-495c-b66b-bd0675ee59a9.gif" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="479" height="307"><figcaption><span style="white-space: pre-wrap;">Loxo&apos;s autonomous driver in the streets</span></figcaption></figure><p>These robots navigate real streets, interact with real traffic, and do so using an architecture powered by End-to-End Deep Learning.</p><p>I find this incredible, because End-To-End Learning is purely AI based. It&apos;s data based, it&apos;s when you don&apos;t explicitely program the vehicle to stop at red light, but show it via examples from the dataset. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="1438" height="488" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2026/01/Screenshot-2026-01-20-at-11.23.55--1-.jpg 1438w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Modular vs End-To-End</span></figcaption></figure><p>While a modular approach is pretty straightforward, and certification is about evaluating each individual block (is the lane detection safe? is the obstacle detection safe?)...</p><p>... <strong>End-To-End approaches are much more complex to evaluate</strong>, because they only output the final driving decision. I have <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">an entire article covering the differences here</a>.</p><p>So I asked Jonathan:</p><h3 id="how-do-you-make-end-to-end-learning-safe"><strong>&quot;How do you make End-To-End Learning safe?&quot;</strong></h3><p>Here is what he explained:</p><figure class="kg-card kg-video-card kg-width-regular kg-card-hascaption" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b.mp4" poster="https://img.spacergif.org/v1/1280x720/0a/spacer.png" width="1280" height="720" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet01b_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:46</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            <figcaption><p><span style="white-space: pre-wrap;"> LOXO uses End-To-End Learning in Production</span></p></figcaption>
        </figure><p>As Jonathan pointed out: </p><blockquote class="kg-blockquote-alt">&#x201C;You cannot really prove that AI is safe, not today. So we run our AI system in parallel with another component that verifies the trajectory. If the AI violates any predefined rule, we switch to a deterministic safe path.&#x201D;</blockquote><p>This point explained is crucial, because several self-driving car companies use exactly the same approach. LOXO does not rely on a single neural network but on <strong>four independent channels</strong> (two AI channels, and two deterministic channels) running in parallel, each serving a different role in verifying, supervising, or backing up the End-to-End planner.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="770" height="303" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png 600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/embeddable_b2a2de66-b368-4269-9018-38f1058df12d.png 770w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The LOXO Architecture isn&apos;t just ONE neural network, but 4 separate channels</span></figcaption></figure><p><strong>LOXO&#x2019;s architecture is a clear illustration of the principle of redundancy:</strong> Multiple algorithms, points of view, and, instead of a failure mode, a structure that catches, compensates for, and, if needed, overrides failures.<strong> </strong><u>This is how an End-to-End system becomes certifiable and safer.</u></p><p>The key point to understand is that companies relying on End-To-End do not use just that one approach; they run multiple algorithms in parallel that verify and contradict eachother. <a href="https://www.thinkautonomous.ai/sdc-app" rel="noreferrer"><strong>I have a complete breakdown of how Mobileye does it with their own End-To-End approach here, if you&apos;re interested</strong></a>.</p><p>Still, a question remains: </p><h4 id="what-exactly-do-you-make-redundant"><strong>What exactly do you make redundant?</strong> </h4><p>The sensors? The algorithms? What is even redundancy? This is my next question for Jonathan, which then explains the safety fundamentals of ASIL scoring and decomposition using among other a grading from A (safe) to D (risky):</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil.mp4" poster="https://img.spacergif.org/v1/1280x720/0a/spacer.png" width="1280" height="720" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2026/01/TAmember_LoxoInterview_Snippet04-Asil_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">2:32</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p>The entire principle relies on the concept of Functional Safety with ASIL Decomposition. This is a job on its own, that often includes ISO norms, but if you&apos;d like to explore this, I have a complete article covering how it works here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/functional-safety/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Functional Safety Engineer: The Job that &#x2018;certifies&#x2019; self-driving cars</div><div class="kg-bookmark-description">What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we&#x2019;ll try to understand how to certify a self-driving car code, and make it safe to drive in the streets</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"></div></a></figure><p><strong>Realize that this doesn&apos;t stop here. </strong>In my interview with Jonathan, Loxo explains the step-by-step framework they implement, along with their internal documents used to grade a function, evaluate its risk, and decide to make it redundant or not.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo-process.001.jpeg" class="kg-image" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2026/01/loxo-process.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2026/01/loxo-process.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2026/01/loxo-process.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2026/01/loxo-process.001.jpeg 1920w" sizes="(min-width: 720px) 720px"></figure><p>It&apos;s a complete masterclass we have inside <a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer">The Edgeneer&apos;s Land</a>, our community membership experience.</p><p>But for now, let&apos;s do a brief summary:</p><h2 id="summary">Summary</h2><ul><li><strong>When a self-driving car company uses End-To-End Learning</strong>, a single machine-learning model directly maps raw sensor data to driving actions or trajectories; without manually writing any rule.</li><li><strong>While this can simplify system design and improve performance</strong>, it also makes the system harder to interpret, verify, and certify, especially in safety-critical and regulated environments.</li><li><strong>Companies like LOXO often use redundant channels</strong> that are the opposite of End-To-End channels; using point clouds processing, clustering, extraction, and very deterministic approaches to try and validate what the AI says.</li><li><strong>Functional Safety Systems like ASIL Decomposition</strong> still apply to End-To-End, and there are many processes used to certify self-driving car algorithms.</li></ul><p><strong>Next Steps?</strong> <br>If you want to go deeper into how safety is formally addressed in the autonomous driving industry (how risks are identified, graded, reduced, and documented), I detail the full process in this <a href="https://www.thinkautonomous.ai/blog/functional-safety/" rel="noopener noreferrer">blog post</a> about functional safety.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/functional-safety/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Functional Safety Engineer: The Job that &#x2018;certifies&#x2019; self-driving cars</div><div class="kg-bookmark-description">What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we&#x2019;ll try to understand how to certify a self-driving car code, and make it safe to drive in the streets</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="LOXO: How to certify End-To-End algorithms in production with Jonathan P&#xE9;clat"></div></a></figure><p>If you&apos;d like to get the complete masterclass from Jonathan, I recommend checking out <a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer">The Edgeneer&apos;s Land</a>.</p>]]></content:encoded></item><item><title><![CDATA[LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry]]></title><description><![CDATA[Since the beginning of the self-driving car era, many people wanted to compare LiDAR vs RADAR. It didn't make sense: these sensors were complementary back then. Today, at the age of 4D, the LiDAR vs RADAR comparison makes real sense, let's see...]]></description><link>https://www.thinkautonomous.ai/blog/fmcw-lidars-vs-imaging-radars/</link><guid isPermaLink="false">62a25f550f1a5e26a580b87a</guid><category><![CDATA[lidar]]></category><category><![CDATA[robotics]]></category><category><![CDATA[sensor fusion]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 29 Oct 2025 11:02:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2023/09/lidar-vs-radar--1-.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/lidar-vs-radar--1-.webp" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry"><p><strong>Back in 2020, a company contacted me because they needed my opinion on a robotic sensor stack they were working on.</strong> They had 2 days to finalize the decision of a sensor suite that would equip their autonomous delivery pods. Like many, they were considering using a combination of all sensors, cameras, LiDARs, RADARs, and even ultrasonic sensors. But they also had concerns, and were wondering if nothing better was available.</p><p><strong>But in 2020, the combination of a LiDAR, a camera, and a RADAR was what made the most sense. </strong>&quot;These sensors are complementary&quot; I would reply. &quot;The LiDAR is the most accurate sensor to detect a distance, the camera is best for scene understanding, and the RADAR can see through objects and directly estimate velocities&quot;.</p><p><strong>Is this still true?</strong> Don&apos;t we have camera only systems today? Don&apos;t we have LiDAR only systems that bypass RADARs? And don&apos;t we have RADARs that are getting as good, if not better, as LiDARs? I think the idea of &quot;complementarity&quot; is changing. Today, sensors get more capable. FMCW LiDARs can detect speed, and Imaging RADARs can great accurate point cloud representations.</p><p>So, what is true and what isn&apos;t?</p><p><strong>Let&apos;s take a look via this article in 3 points:</strong></p><ul><li>The Traditional LiDAR vs RADAR comparison</li><li>The new LiDAR and RADAR sensors in self-driving cars</li><li>LiDARs vs RADARs: The Modern Comparison</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Warning Graphic Content: <b><strong style="white-space: pre-wrap;">Ever gutted out a LiDAR?!</strong></b> What does it look like inside? I recorded a video explaining how an emitter and a received works &amp; how it works internally. <br>Watch it <a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-slamtechs-rp-lidar-ungutting" rel="noopener noreferrer">here in my private app.</a></div></div><h2 id="traditional-lidar-and-radar-technology-comparison">Traditional<strong> LiDAR and RADAR technology comparison</strong></h2><p>I believe the following no longer makes sense, but I am going to show it to you anyway, and this is what you&apos;ll see in 99% of other posts about the topic. Here is the idea in 3 points:</p><h3 id="1lidars-are-great-for-distance-estimation">1 - LiDARs are great for distance estimation</h3><p><strong>LiDAR</strong> <strong>(Light Detection and Ranging) is a technology that leverages laser light to measure distances and create detailed 3D maps of objects and environments.</strong> When you look at a distance estimators today, the LiDAR is often used as the &quot;<u>ground truth&quot;</u>. LiDAR systems operate by emitting laser pulses (waves) and calculating the time it takes for the light to come back. This idea is called &quot;Time of Flight&quot; - and although there are multiple <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">types of LiDARs</a>, this is the overall idea.</p><p>Here is an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/tof-lidar.webp" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="800" height="358" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/tof-lidar.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2023/02/tof-lidar.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">How a Time Of Flight LiDAR works</span></figcaption></figure><p>Now, what does it produce? The answer is a <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noopener noreferrer">point cloud</a> of the environment. But not all point clouds look the same.</p><h4 id="2d-vs-3d-lidars">2D vs 3D LiDARs</h4><p>Because I&apos;m going to talk about 4D LiDARs, I have to explain the idea of a 2D and a 3D LiDAR first. The idea is well explained in my post &quot;<a href="https://www.thinkautonomous.ai/blog/2d-lidar/" rel="noopener noreferrer"><strong>2D LiDARs: Too Weak for Self-Driving Cars?</strong></a>&quot;, in which I explain that LiDARs use vertical &quot;channels&quot; or layers, and that based on the number of channels, you have a more accurate 3D resolution.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1120" height="792" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/Screenshot-2024-11-04-at-17.47.34--1-.jpg 1120w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR Resolution depends on the number of channels - 1 layer means your LiDAR only sees in 2D. (</span><a href="https://www.thinkautonomous.ai/blog/2d-lidar/" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="what-more-channels-bring">What more channels bring</h4><p><strong>Lidar utilizes laser pulses to send out laser beams</strong>, measure <u>distances</u>, and create detailed 3D maps. But the drawback is that if you want to measure a velocity, you need to compute the difference between 2 consecutive timestamps. How has the point cloud moved in the last second? At low speed, this is good enough, but at high speed, measuring the differences between 2 frames can mean several meters before braking.</p><p>This is why we also like to combine it with a RADAR. Let&apos;s see it:</p><h3 id="2radars-are-great-velocity-estimators">2 - RADARs are great velocity estimators</h3><p><strong>RADAR stands for Radio Detection And Ranging</strong>. It works by emitting electromagnetic waves that reflect when they meet an obstacle. Unlike cameras or LiDARs, RADAR relies on radio waves that can work under any weather condition, and even see underneath obstacles. They use the &quot;Doppler Effect&quot; to measure the velocity of obstacles<em>.</em></p><p><strong>RADAR technology is very mature </strong>(&gt;100 years old), and is used in various industries, including aviation, where it is crucial for air traffic control, cars, missile detection, and even weather forecasting. <u>However, most RADARs work in 2D.</u> Haaaaa - yes, this is what we got: <strong>X and Y, but no Z</strong>, exactly like a one-channel LiDAR.</p><p>Should I show you the sample point cloud from a RADAR?</p><h4 id="output-from-a-radar-system">Output from a RADAR system</h4><p>But let me show you the real output from a RADAR sensor:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/ezgif.com-gif-to-webp.webp" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="345" height="265"><figcaption><span style="white-space: pre-wrap;">(</span><a href="https://www.youtube.com/watch?v=N_8ONE9WqXw" rel="noopener noreferrer"><u><span class="underline" style="white-space: pre-wrap;">source</span></u></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>I mean, can you tell where there is a vehicle? </strong>Whether we should stop or not? It&apos;s complete garbag&#x2014; wait, if people use it, it&apos;s gotta be useful, right? And yes, it is, because while we only have noisy 2D point cloud, each of these points also provide a 1D velocity information. RADARs tell us whether the points are going away from us, or towards us, and how fast.</p><p>Using Point Clouds Processing, Deep Learning (often trained on LiDAR data), or even <a href="https://thinkautonomous.ai/blog/introduction-to-radar-camera-fusion" rel="noopener noreferrer"><u>RADAR/Camera Fusion</u></a>, we can even get a result like this:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/radarcamera.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/radarcamera.mp4" type="video/mp4">
</video>
<figcaption>A RADAR fused with a camera (<a href="https://www.youtube.com/watch?v=Xk5xbxHTt00" rel="noopener noreferrer"><u>source</u></a>)</figcaption>
</figure>
<!--kg-card-end: html-->
<p>Notice how the yellow dot changes to a green color as soon as the car moves, and how each static object is orange, while moving objects have a color. This is because the RADAR is really good at measuring velocities.</p><h3 id="3lidars-and-radars-are-complementary-and-still-need-eachother">3 - LiDARs and RADARs are complementary and still need eachother</h3><p><strong>As a little summary, I&apos;d say that LiDARs are good</strong>, but most of the time need cameras for context, and at high speed, need RADARs. RADARs are great, but could NOT work as a standalone system. So let&apos;s do a quick overview:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/camera-lidar-radar--1-.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="2000" height="1169" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/camera-lidar-radar--1-.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/02/camera-lidar-radar--1-.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/02/camera-lidar-radar--1-.png 1600w, https://www.thinkautonomous.ai/blog/content/images/size/w2400/2023/02/camera-lidar-radar--1-.png 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Camera vs LiDAR vs RADAR comparison</span></figcaption></figure><p>If you want to be green everywhere, you need to combine all 3. Use the camera for scene understanding, use the RADAR for weather conditions and velocity measurement, and the LiDAR for distance estimation.</p><p><strong>This brings a problem. </strong>A random startup must invest in 3 sensors, co-calibrate 3 sensors, train their team on all these sensor types, and the more sensors we use, the more confusion we risk bringing. You may wonder... can&apos;t we use just one? Or two?</p><p>Let&#x2019;s see how:</p><h2 id="fmcw-lidars-imaging-radars-the-future-of-perception">FMCW LiDARs &amp; Imaging RADARs: The Future of Perception</h2><p><strong>Back in January 2023, I was at CES in Las Vegas for the first time.</strong> It was a big show, really incredible, and while walking there, I met a startup named &apos;Aeva&apos;. Aeva is a LiDAR startup specialized in 4D technology. &quot;What&apos;s 4D?&quot; I asked. It turns out, 4D meat that their LiDARs had the possibility to do direct velocity estimation.</p><p><strong>The next day, I walked to a different area and stumble across a korean startup called bitsensing</strong>. &quot;Bitsensing is creating a 4D Imaging RADAR&quot; said the presentator. I was in shock. It was a normal RADAR, but providing an incredible resolution, with Z-elevation, accurate 3D view, no noise, and still the Doppler velocity measurement.</p><p>It sounded like these startups were working on fixing the weaknesses of classical technologies.</p><p>Let me introduce them to you.</p><h3 id="1fmcw-lidar-frequency-modulated-continuous-wave-lidar-4d-lidar"><strong>1 - FMCW LiDAR (Frequency Modulated Continuous Wave LiDAR): 4D LiDAR</strong></h3><blockquote><em>An FMCW LiDAR (or 4D LiDAR, or Doppler LiDAR) is a LiDAR that can return the depth information, but also <u>directly measure the speed of an object</u>. What happens behind the scenes if they steal the RADAR Doppler Technology and adapt it to a light sensor.</em></blockquote><p>Here&apos;s what the startup <strong>Aurora</strong> is doing on LiDARs... notice how moving objects are colored while others aren&apos;t:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/FMCWlidar.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/FMCWlidar.mp4" type="video/mp4">
</video>
<figcaption><a href="https://www.aeva.com">Aeva&apos;s</a> FMCW LiDAR that can estimate velocities and predict trajectories (blue: approaching | red: receding)</figcaption>
</figure>
<!--kg-card-end: html-->
<p><strong>LiDAR uses the Doppler Effect, similarly to the RADAR technology, to get this 4D view</strong>. The main idea can be seen on this image, where we play with the frequency of the returned wave to measure the velocity.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://io.dropinblog.com/uploaded/blogs/34241363/files/radar_11.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1200" height="626"><figcaption><span style="white-space: pre-wrap;">If a wave is reflected at a higher frequency, the the object is approaching. If lower, it&apos;s going away from us. (</span><a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/"><span style="white-space: pre-wrap;">see it on the FMCW LiDAR post</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>The Doppler Effect is exactly about measuring this frequency.</strong> And this has now been adopted in FMCW LiDAR technology, but still with light waves instead of radio waves. I highly recommend to check out my complete post called &quot;<a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/" rel="noopener noreferrer">Understanding the magnificent FMCW LiDAR</a>&quot;.</p><h3 id="2imaging-radar-4d-radar">2 - Imaging RADAR: 4D RADAR</h3><p><strong>In 2024, mobileye, who had been working on their own FMCW LiDAR for years, announced it would be shutting down its entire FMCW LiDAR division to focus on proprietary</strong> <strong>4D Imaging RADAR</strong>. What happened? Why the shift? Well, let&apos;s first try to understand what Imaging RADARs are. I like to call these...</p><blockquote class="kg-blockquote-alt"><strong>RADAR on steroids!</strong></blockquote><p>To understand better how it works, I&apos;d like to show you the bitsensing demo they showed me at CES.</p><h4 id="bitsensing-imaging-radar-demo"><strong>bitsensing Imaging RADAR Demo</strong></h4><p>The Imaging RADAR has an incredible resolution. It provides a very accurate point cloud, that can see through adverse weather conditions, do obstacle detection AND measure velocity directly! Under-the-hood, it uses a set of MIMO antennas to get a much better resolution, range, and precision. We could in fact detect obstacles inside a vehicle, and classify children from parents.</p><p>See the demo:</p>
<!--kg-card-begin: html-->
<iframe src="https://player.vimeo.com/video/807852889?h=dfaf463bd4&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="3363091915"></iframe>
<!--kg-card-end: html-->
<p><strong>Can you notice how similar it looks to the FMCW LiDAR? We have in both cases:</strong></p><ul><li>A 3D Point Cloud</li><li>That can directly measure velocity</li></ul><h4 id="other-examples-from-self-driving-cars">Other Examples from Self-Driving Cars</h4><p>Frankly, many actors from the autonomous driving industry are switching to Imaging RADARs. Mobileye has a great demo, so does Waymo. Let&apos;s see these 2 examples.</p><p>Here&apos;s the Waymo Imaging RADAR Demo:</p>
<!--kg-card-begin: html-->
<figure class="kg-card kg-image-card kg-card-hascaption">
<video class="lazy" style="max-width:100%" controls poster="https://www.thinkautonomous.ai/blog/content/images/2023/04/ImagingRadar.webp" preload="none" muted loop playsinline>
<source src="https://www.thinkautonomous.ai/blog/content/media/2023/04/ImagingRadar.mp4" type="video/mp4">
</video>
<figcaption>View of the Waymo&apos;s Imaging RADAR (<a href="https://blog.waymo.com/2021/11/a-fog-blog.html?__s=xxxxxxx" rel="noopener noreferrer"><u>source</u></a>)</figcaption>
</figure>
<!--kg-card-end: html-->
<p>And now Mobileye:</p>
<!--kg-card-begin: html-->
<div class="yt-lite">
    <a class="yt-thumb" data-src="b3WSAYguMaY" target="_blank" rel="noopener noreferrer" href="https://www.youtube.com/watch?v=b3WSAYguMaY">
    <img src="https://i.ytimg.com/vi/b3WSAYguMaY/hqdefault.jpg" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy">
    <span class="yt-play" aria-hidden="true"></span>
    </a>
</div>
<!--kg-card-end: html-->
<p>See? We are in the middle of a <u>transition</u>... but why are people using Imaging RADARs over FMCW LiDARs? And are they really moving away from LiDARs? Let&apos;s find out in the final point...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Learning from theory is one thing, but opening a LiDAR teaches you more than any diagram ever could: from how the emitter &amp; receiver system works, to how raw points become 3D data.<b><strong style="white-space: pre-wrap;"> Watch how I literally opened a LiDAR </strong></b><a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-slamtechs-rp-lidar-ungutting" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h2 id="lidars-vs-radars-the-modern-comparison">LiDARs vs RADARs: The Modern Comparison</h2><p>There are 2 ideas I&apos;d like to talk about here:</p><ol><li>The Future of RADARs IS Imaging based</li><li>The Future of LiDARs may NOT be FMCW based</li></ol><h3 id="1the-future-of-radars-is-imaging-based">1 - <strong>The Future of RADARs IS Imaging based</strong></h3><p><strong>We have clearly see how a good RADAR system can bring incredible benefits</strong>. We can now do tasks like object detection using purely an imaging RADAR. Recently, we&apos;ve seen Deep Learning models, like the ones from <a href="https://www.perciv.ai" rel="noopener noreferrer"><strong>Perciv AI</strong></a>, work on RADAR data (radar signals, radar point clouds, radar waves, ...) directly.</p><p><strong>Back in the day, any comparison between a LiDAR and a RADAR didn&#x2019;t really make sense</strong> because the sensors were highly complementary. <u>But today, these sensors can be in competition</u>, and if there is one, Imaging RADARs are winning it! If we see the new comparison table now:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="2000" height="1162" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 1600w, https://www.thinkautonomous.ai/blog/content/images/size/w2400/2023/02/4e5fe9dc-673e-437e-b1dd-7b524857a8e4--1-.png 2400w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Camera vs FMCW LiDAR vs Imaging RADAR &#x2014; blue: Improved, red: Worse</span></figcaption></figure><p><strong>We are BLUE almost everywhere, but the cost of Imaging RADAR stays lower than FCMW LiDARs.</strong> In addition to this, the Imaging RADAR can nicely fit under a bumper, since RADAR employs radio waves that go through objects.</p><p><strong>When looking at remote sensing technology, RADAR has always been a great choice</strong>; whether it&apos;s synthetic aperture radar systems in the military field, or environmental monitoring of their radio frequency spectrum, or the recent adoption in autonomous vehicles, RADARs ARE by default a great choice.</p><p><strong>In the self-driving space, RADARs were never good enough to be a standalone</strong>. Have someone ever told you you weren&apos;t good enough? Well, this is a lesson, because you can see a massive adoption and trend of Imaging RADAR - and I believe the future of RADARs is imaging.</p><h3 id="2the-future-of-lidars-is-not-fmcw-based">2 - <strong>The Future of LiDARs IS NOT  FMCW based</strong></h3><p><strong>Now this is the incredible discovery here:</strong> <u>Nobody is abandonning LiDARs for FMCW LiDARs</u>. Self-driving car companies have NOT adopted FMCW LiDAR technology in mass (for now), and I predict they&apos;ll just stick to solid-state.</p><p><strong>Back in 2023, I went to Innoviz Technologies headquarters in Israel</strong>. Innoviz is a LiDAR manufacturing companies providing LiDAR devices to companies like BMW. I asked them: &quot;Why are you NOT building FMCW LiDARs?&quot;. Their answer was that their LiDARs were good enough, and that there was no real need for FMCW. It really surprised me, but I guess they know what they&apos;re talking about. They could solve the drawbacks of LiDARs by building better LiDARs, for example here:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/lidars-evolution.jpg" class="kg-image" alt="LiDAR vs RADAR: How 4D Imaging RADARs and FMCW LiDARs disrupt the Autonomous Tech Industry" loading="lazy" width="1590" height="550" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/lidars-evolution.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/lidars-evolution.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/lidars-evolution.jpg 1590w" sizes="(min-width: 720px) 720px"><figcaption><a href="https://innoviz.tech" rel="noreferrer"><span style="white-space: pre-wrap;">Innoviz Technologies</span></a><span style="white-space: pre-wrap;"> provide incredible resolution in their LiDAR sensors</span></figcaption></figure><p><strong>In many fields, LiDAR sensors are at the core.</strong> We have airborne lidar systems building elevation maps, we have HD Maps built entirely from LiDARs, and even drones equipped with LiDARs today... this technology is here to stay. Plus, today, EVERYONE uses LiDARs! No that was wrong, Tesla doesn&apos;t, and a few others have bet on vision-only... but the majority of startups do, except that they use <u>BETTER LiDARs</u>. Not necessarily 4D, but LiDARs that provide better resolution, focusing more on solid-state technology.</p><p>This is the key message I have for you, and now that we&apos;ve seen it, let&apos;s go through a summary, and see some next steps.</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>LiDAR uses laser light to measure distances and create detailed 3D maps of objects and environments</strong>. Their key strength is distance estimation. They key weakness is weak velocity estimation, and weather conditions.</li><li><strong>RADAR emits radio waves and measures their reflections </strong>to detect objects and calculate their speed, even in bad weather. They key strength is velocity estimation, they key weakness is noise, context, and 3D estimation (most are only 2D).</li><li><strong>Traditional setups combine LiDAR, RADAR, and cameras</strong> because each sensor complements the others&apos; strengths and weaknesses. It&apos;s near unthinkable to use one as a standalone.</li><li><strong>Recently, technologies like 4D FMCW LiDAR and Imaging RADAR have emerged</strong>, offering both high resolution and velocity measurement. FMCW LiDARs use the Doppler effect, and Imaging RADARs use more antennas.</li><li><strong>While the future of RADAR is (I believe) RADAR+Imaging capabilities</strong>, I believe the future of LiDARs may be solid-state based, and not necessarily FMCW/4D based.</li></ul><h3 id="next-steps">Next Steps</h3><ul><li>Learn about the FMCW LiDAR <a href="https://www.thinkautonomous.ai/blog/fmcw-lidar" rel="noopener noreferrer">here</a>.</li><li>Learn about the Imaging RADAR <a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer">here</a>.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text">If you want to learn more about LiDARs and cutting-edge technology, I&apos;m sending emails every day about these technologies, and they&apos;re read by over 10,000 Engineers. You should join the daily emails <a href="https://www.thinkautonomous.ai/private-emails" rel="noopener noreferrer">here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[3 Insights from Autoware's Transition to End-To-End Learning with Samet Kütük]]></title><description><![CDATA[Autoware is transitioning to End-To-End Learning. When? And How exactly will this happen? This is what we'll find out this month, in this exclusive interview with Samet Kukut.]]></description><link>https://www.thinkautonomous.ai/blog/autoware-end-to-end/</link><guid isPermaLink="false">68f67f87bad329532556f144</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Thu, 23 Oct 2025 08:47:17 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-end-to-end.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-end-to-end.jpeg" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><p><strong>Did you ever wonder... why are self-driving cars taking so long to come?</strong> I had that question too when starting, and my first answer came from Sebastian Thrun, godfather of self-driving cars, who talked about reaching 90% of use cases easily, but then finding huge difficulty in going from <strong>90%</strong> to <strong>100%</strong>. Recently, Andrej Karpathy, former Lead of Tesla Autopilot described something similar as &quot;the march of 9s&quot;:</p><blockquote>&quot;When you get a demo and something works 90% of the time, that&apos;s just the first 9 and then you need the second 9 and third 9, fourth 9, fifth 9...&quot;</blockquote><p><strong>This is what&apos;s taking long, but instead of focusing on this, tons of companies lose time focusing on the first 0-90%. </strong>Back in 2017 or so, we were all trying to get to 90%, and for this, we were re-developping all the software, algorithms, and so on... At some point, probably 30 startups were all spending millions developing the exact same algorithms.</p><p><strong>This is when Autoware comes in the play</strong>. Started by <a href="https://tier4.jp/en/" rel="noreferrer">Tier IV</a>, Autoware is an open-source self-driving car software that allows you to achieve the first <strong>9</strong> in just a few weeks. Rather than re-developping yet another version of the same code, you <em>jumpstart</em> from the existing state of the art, and finetune it for your needs.</p><p><strong>This month, our membership </strong><a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a><strong> is welcoming Samet K&#xFC;t&#xFC;k from </strong><a href="https://www.autoware.org" rel="noreferrer"><strong>The Autoware Foundation</strong></a><strong>. </strong></p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x2712;&#xFE0F;</div><div class="kg-callout-text">Samet is currently the Community Advocate and Head of Marketing at the Autoware Foundation. Before that, Samet co-founded a company in Istanbul called <a href="https://www.leodrive.ai/" rel="noreferrer">Leo Drive</a>, where he worked for a decade on implementing Autoware in various vehicle platforms, including retrofitting a Volkswagen Golf for autonomous operation. <br><br><b><strong style="white-space: pre-wrap;">Now based in Zurich, he is fully dedicated to the Autoware Foundation</strong></b>, focusing on marketing, member recruitment, and participating in technical workgroups, particularly in software-defined vehicles and cloud-native development.</div></div><p>And let me start with a small snippet about how he defines Autoware:</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet1d_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:20</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p>Together, we recorded a new Fragment of <a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a>, my community membership experience, in which he takes us through the building and management of Autoware. <strong> </strong>How does it work? How do you build a self-driving car with a full remote team? This is everything Samet teaches in our new fragment...</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F50F;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interesting in getting access? </strong></b>Apply here <u>before October 25, 2025</u> (use the direct link if you&apos;re already a pre-approved client).</div></div><p>In this post, I&apos;d like to give you a small sample of that interview, highlighting a very interesting moment where Samet talked about End-To-End learning...</p><hr><h2 id="3-insights-from-autowares-end-to-end-learning-transition">3 insights from Autoware&apos;s End-To-End Learning Transition</h2><p><strong>Since its creation, Autoware has been implementing a &quot;robotic&quot; architecture,</strong> meaning implementing the traditinal &quot;4 pillars&quot;: Perception &#x2192; Localization &#x2192; Planning &#x2192; Control.</p><p><strong>But recently, Autoware announced a new plan to evolve to an End-To-End architecture,</strong> a single neural network that takes in the input sensor data, and automatically outputs the steering angle and acceleration value. I have a complete article explaining the differences with detailed examples <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">here</a>.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">4 Pillars vs End To End: How to pick an autonomous vehicle architecture</div><div class="kg-bookmark-description">How to design an autonomous vehicle architecture? Should you implement an End-To-End solution, or a more traditional one? Let&#x2019;s see&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/autonomous-vehicle-architecture--1-.webp" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"></div></a></figure><p>So here is the sample I&apos;d like to share:</p><figure class="kg-card kg-video-card kg-width-regular" data-kg-thumbnail="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3_thumb.jpg" data-kg-custom-thumbnail>
            <div class="kg-video-container">
                <video src="https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3.mp4" poster="https://img.spacergif.org/v1/1920x1080/0a/spacer.png" width="1920" height="1080" playsinline preload="metadata" style="background: transparent url(&apos;https://www.thinkautonomous.ai/blog/content/media/2025/10/TAmember_Autoware_snippet2v3_thumb.jpg&apos;) 50% 50% / cover no-repeat;"></video>
                <div class="kg-video-overlay">
                    <button class="kg-video-large-play-icon" aria-label="Play video">
                        <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                            <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                        </svg>
                    </button>
                </div>
                <div class="kg-video-player-container">
                    <div class="kg-video-player">
                        <button class="kg-video-play-icon" aria-label="Play video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-pause-icon kg-video-hide" aria-label="Pause video">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                                <rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/>
                            </svg>
                        </button>
                        <span class="kg-video-current-time">0:00</span>
                        <div class="kg-video-time">
                            /<span class="kg-video-duration">1:16</span>
                        </div>
                        <input type="range" class="kg-video-seek-slider" max="100" value="0">
                        <button class="kg-video-playback-rate" aria-label="Adjust playback speed">1&#xD7;</button>
                        <button class="kg-video-unmute-icon" aria-label="Unmute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/>
                            </svg>
                        </button>
                        <button class="kg-video-mute-icon kg-video-hide" aria-label="Mute">
                            <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                                <path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/>
                            </svg>
                        </button>
                        <input type="range" class="kg-video-volume-slider" max="100" value="100">
                    </div>
                </div>
            </div>
            
        </figure><p><strong>As you can see, there is a lot to uncover from just one minute. Let me share 3 highlights from that:</strong></p><ol><li>Level 5 is NOT easy to reach, may not even be possible, which is why Autoware focuses on Level 4+, in which humans, while not asked to takeover the car, could be asked to drive under regions or conditions that aren&apos;t appropriate. </li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-level-4-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1182" height="738" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-level-4-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-level-4-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-level-4-.jpg 1182w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Autoware doesn&apos;t claim to reach Level 5, but a very good Level 4+</span></figcaption></figure><ol start="2"><li>The Transition will NOT be achieved immediately, but rather then result of several steps:<ol><li><strong>Current &#x2014;</strong>&#xA0;Starting from a traditional Robotic stack</li><li><strong>Step 1 &#x2014;</strong>&#xA0;Learned Planning</li><li><strong>Step 2 &#x2014;&#xA0;</strong>Deep Perception &amp; Learned Planning</li><li><strong>Step 3 &#x2014;&#xA0;</strong>Monolythic End-to-End (single network)</li><li><strong>Step 4 </strong>&#x2014; Hybrid End-To-End using a &quot;guardian&quot; for redundancy</li></ol></li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e.gif" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1080" height="608" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-e2e.gif 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-e2e.gif 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e.gif 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Autoware&apos;s 4 Step Transition to End-To-End Learning</span></figcaption></figure><p>This can feel similar to how Tesla did their own transition to End-To-End (which I cover in this article:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Breakdown: How Tesla will transition from Modular to End-To-End Deep Learning</div><div class="kg-bookmark-description">It&#x2019;s no secret, Tesla is going to use End-To-End Deep Learning. But how? What will it look like? Will the Occupancy Network and HydraNet stay? Here&#x2019;s a full breakdown&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/09/tesla-end-to-end.png" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k"></div></a></figure><p>The difference between Modular and Monolythic End-To-End has been explained in their <a href="http://github.com/tier4/new_planning_framework/wiki" rel="noreferrer">GitHub repository</a> talking about the new planning algorithm:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://media1-production-mightynetworks.imgix.net/asset/f284a2ad-0c52-4342-8e80-1b60665c524d/70581e7ae5e4e7d8.png?ixlib=rails-4.3.1&amp;fm=jpg&amp;q=75&amp;auto=format&amp;w=4096&amp;h=4096&amp;fit=max&amp;impolicy=ResizeCrop&amp;aspect=fit" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1286" height="496"><figcaption><span style="white-space: pre-wrap;">Modular versus Monolythic End-To-End. Originally, everybody tried monolythic, then reverted to modular, and are now trying monolythic again with safety guardians.</span></figcaption></figure><p>Alright, let&apos;s continue with a third and final idea:</p><ol start="3"><li><strong>The algorithms for End-To-End have already been built.</strong></li></ol><p>We are not talking about a distant future, according to Autoware, it&apos;s possible to achieve End-To-End with today&apos;s algorithms, including (but not limited to) <a href="https://autowarefoundation.github.io/autoware_universe/main/perception/autoware_lidar_centerpoint/" rel="noreferrer"><strong>CenterPoint</strong></a> as the 3D Deep Learning algorithm for LiDAR Detection, <a href="https://github.com/autowarefoundation/autoware.privately-owned-vehicles/tree/main/AutoSeg" rel="noreferrer"><strong>AutoSeg</strong></a> as the Foundation Model in Perception, <strong>AutoSteer</strong> and <a href="https://github.com/ZhengYinan-AIR/Diffusion-Planner" rel="noreferrer"><strong>Diffusion Planner</strong></a> for the Learned Planning approaches.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="2000" height="947" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/autoware-e2e-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/autoware-e2e-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/10/autoware-e2e-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/10/autoware-e2e-.jpg 2176w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Autoware Modular End-To-End Architecture will feature these 4 core algorithms</span></figcaption></figure><p>See? They are already there, and even though 2, 3, or 5 years from now, they may evolve and get replaced, the <strong><em>logic</em></strong> of Modular End-To-End (Step 2) has been implemented.</p><p>For example, the <strong>AutoSeg</strong> algorithm is a &quot;<a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noreferrer">HydraNet</a>&quot; that has a single backbone that split into several heads for lane lines, ego path, free space, segmentation, objects, and 3D. The outputs of these heads are then passed to the deep planner.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/AutoSeg--1-.jpg" class="kg-image" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/10/AutoSeg--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/10/AutoSeg--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/10/AutoSeg--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/10/AutoSeg--1-.jpg 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A look at AutoSeg, the HydraNet used by Autoware</span></figcaption></figure><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F500;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interested in End-To-End?</strong></b> Autoware has published a detailed PDF about their transition to End-To-End, you can download it on <a href="https://autoware.org/" rel="noreferrer">this page</a>.</div></div><h3 id="summary-next-steps">Summary &amp; Next Steps</h3><ul><li><strong>Autoware is an open source self-driving car organization</strong> that builds a self-driving car software used over the world by thousands of engineers and teams</li><li><strong>Autoware is the solution I recommend</strong> to get started in self-driving cars; rather than building a software from scratch, get Autoware working quickly, and then finetune and customize it for your applications.</li><li><strong>Autoware is transitioning</strong> from a robotic architecture to an End-To-End Learning architecture, and there are 3 highlights from it:<ul><li>It won&apos;t reach Level 5, but a <strong>Level 4+</strong> that can drive almost anywhere</li><li>The transition will happen in <strong>4 steps</strong>, adding planning, perception, then turning into a monolythic architecture, and finally hybrid.</li><li>The algorithms and modular logic have already been implemented and are working, such as <strong>CenterPoint</strong>, <strong>AutoSeg</strong>, or <strong>Diffusion</strong> <strong>Planner</strong>.</li></ul></li></ul><h3 id="next-steps">Next steps</h3><p><strong>Interesting in getting access to our Autoware Fragment? </strong>It&apos;s going to be very cool, and feature several things, such as:</p><ul><li>The Full-Length interview with Samet on Autoware</li><li>An even deeper dive on Autoware&apos;s End-To-End Transition (this was just a 1 minute video - we do it for the full section on End-To-End).</li><li>A complete breakdown on many algorithms used by Autoware, and a near plug &amp; play solution to start running Autoware&apos;s software on your computer by tonight</li></ul><p>Intersted? Apply to the Edgeneer&apos;s Land at the button below <u>before October 25, 2025</u> (use the direct link if you&apos;re already a pre-approved client).</p><div class="kg-card kg-product-card">
            <div class="kg-product-card-container">
                <img src="https://www.thinkautonomous.ai/blog/content/images/2025/10/ChatGPT-Image-21-oct.-2025--17_21_08--1-.jpg" width="1024" height="1024" class="kg-product-card-image" loading="lazy" alt="3 Insights from Autoware&apos;s Transition to End-To-End Learning with Samet K&#xFC;t&#xFC;k">
                <div class="kg-product-card-title-container">
                    <h4 class="kg-product-card-title"><span style="white-space: pre-wrap;">Fragment #13: Open Source AV Secrets</span></h4>
                </div>
                
                    <div class="kg-product-card-rating">
                        <span class="kg-product-card-rating-active kg-product-card-rating-star"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M12.729,1.2l3.346,6.629,6.44.638a.805.805,0,0,1,.5,1.374l-5.3,5.253,1.965,7.138a.813.813,0,0,1-1.151.935L12,19.934,5.48,23.163a.813.813,0,0,1-1.151-.935L6.294,15.09.99,9.837a.805.805,0,0,1,.5-1.374l6.44-.638L11.271,1.2A.819.819,0,0,1,12.729,1.2Z"/></svg></span>
                        <span class="kg-product-card-rating-active kg-product-card-rating-star"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M12.729,1.2l3.346,6.629,6.44.638a.805.805,0,0,1,.5,1.374l-5.3,5.253,1.965,7.138a.813.813,0,0,1-1.151.935L12,19.934,5.48,23.163a.813.813,0,0,1-1.151-.935L6.294,15.09.99,9.837a.805.805,0,0,1,.5-1.374l6.44-.638L11.271,1.2A.819.819,0,0,1,12.729,1.2Z"/></svg></span>
                        <span class="kg-product-card-rating-active kg-product-card-rating-star"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M12.729,1.2l3.346,6.629,6.44.638a.805.805,0,0,1,.5,1.374l-5.3,5.253,1.965,7.138a.813.813,0,0,1-1.151.935L12,19.934,5.48,23.163a.813.813,0,0,1-1.151-.935L6.294,15.09.99,9.837a.805.805,0,0,1,.5-1.374l6.44-.638L11.271,1.2A.819.819,0,0,1,12.729,1.2Z"/></svg></span>
                        <span class="kg-product-card-rating-active kg-product-card-rating-star"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M12.729,1.2l3.346,6.629,6.44.638a.805.805,0,0,1,.5,1.374l-5.3,5.253,1.965,7.138a.813.813,0,0,1-1.151.935L12,19.934,5.48,23.163a.813.813,0,0,1-1.151-.935L6.294,15.09.99,9.837a.805.805,0,0,1,.5-1.374l6.44-.638L11.271,1.2A.819.819,0,0,1,12.729,1.2Z"/></svg></span>
                        <span class="kg-product-card-rating-active kg-product-card-rating-star"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M12.729,1.2l3.346,6.629,6.44.638a.805.805,0,0,1,.5,1.374l-5.3,5.253,1.965,7.138a.813.813,0,0,1-1.151.935L12,19.934,5.48,23.163a.813.813,0,0,1-1.151-.935L6.294,15.09.99,9.837a.805.805,0,0,1,.5-1.374l6.44-.638L11.271,1.2A.819.819,0,0,1,12.729,1.2Z"/></svg></span>
                    </div>
                

                <div class="kg-product-card-description"><p><span style="white-space: pre-wrap;">Exclusive Interview with Autoware - Autoware E2E Transition Full Breakdown - Autoware Platform Run &amp; Algorithm Dive</span></p></div>
                
                    <a href="https://www.thinkautonomous.ai/fragment-13" class="kg-product-card-button kg-product-card-btn-accent" target="_blank" rel="noopener noreferrer"><span>Learn More about Fragment 13</span></a>
                
            </div>
        </div>]]></content:encoded></item><item><title><![CDATA[Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know]]></title><description><![CDATA[Let's reveal it all: What are point clouds? What are 3 Ways to create them? How to process them? How do we detect 3D objects inside a point cloud?]]></description><link>https://www.thinkautonomous.ai/blog/point-clouds/</link><guid isPermaLink="false">640f9074fa7e0be47b3d9d33</guid><category><![CDATA[lidar]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Mon, 29 Sep 2025 15:40:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-clouds.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-clouds.jpg" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know"><p><strong>In September 1519, an expedition of five ships and 270 men,</strong> led by Ferdinand Magellan, left Spain to reach the Spice Islands by sailing west. At the time, maps were crude sketches, full of blank spaces, and sometimes decorated with warnings: <em>&#x201C;Here be dragons.&#x201D;</em> Yet Magellan pressed on, steering his fleet into the unknown, through storms and across oceans no European had crossed before.</p><p><strong>The challenge was harsher than anyone had imagined</strong>. Supplies ran out, men starved, and mutiny spread. One ship deserted, another wrecked. After nearly two years, Magellan reached the Philippines, where he was killed in the Battle of Mactan. His fleet, once five strong, was reduced to four&#x2026; then three&#x2026; then two.</p><p><strong>3 years later, only one ship returned to Spain.</strong> The Victoria carried just 18 survivors, but also one of the greatest accomplishments of the time. For the first time, humanity had proof that the Earth could be circumnavigated by sea, a discovery that forever reshaped navigation, trade, and commerce.</p><p><strong>For centuries, people believed the old maps. </strong>They trusted the flat drawings, the empty warnings, the <em>&#x201C;here be dragons&#x201D;</em>. All it took was one expedition to open a new world nobody could see. And today, I believe Computer Vision Engineers live in a similar situation.</p><p><strong>The world provides Computer Vision algorithms</strong>, image processing techniques, 2D object detectors, and segmentation approaches... yet, the world is a sphere, in 3D. And this is why, I think something of much greater importance should be mastered by Computer Vision and ALL robotics/autonomous tech engineers: <strong>Point Clouds</strong>.</p><p><strong>The goal of a point cloud is to create a 3D model</strong>. 3D points are a data representation used today in autonomous vehicles, robotics, AR/VR, and even in everyday objects like unlocking your phone with Face ID.</p><p>So what are point clouds? How do you get them? And how do you process them using AI? These are the 3 things I think most perception engineers should know, that we&apos;ll cover in this article.</p><p>Let&apos;s begin:</p><h2 id="9-examples-of-point-cloud-data">9 Examples of Point Cloud Data</h2><p>A Point Cloud&quot; is a set of points in 3D space &#x2014; a cloud of points. Inside, each point holds the 3D location of a surface in the real world. It can be a person, a wall, a tree, anything. You probably know what a point cloud looks like already, but you may not know the multiple types of point clouds... So let me introduce you to 9 of them!</p><h3 id="xyz-point-clouds">XYZ Point Clouds</h3><p><strong>In an XYZ point cloud, each point has a specific X, Y, and Z value</strong>. You could think of it as the equivalent of a pixel, but in 3D. Rather than just X and Y, we have X, Y, and Z (in most cases, because some point clouds are 2D, see this article).</p><p>Here&apos;s an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/xyz-point-cloud.png" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1528" height="842" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/xyz-point-cloud.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/xyz-point-cloud.png 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/xyz-point-cloud.png 1528w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">In this very basic point cloud, each point contains the X, Y, Z information</span></figcaption></figure><p>See? Each point has an XYZ value. But why are the colors different? Simply here because our visualizer is a gradient based on the height of the point (the Z dimension). The higher the Z value, the more red it&apos;ll be. On the above Waymo video, you could see a different visualization, based on the distance to the vehicle. So this is one type:</p><ul><li>Point clouds can contain the XYZ information</li></ul><p>Next:</p><h3 id="xyz-i-point-clouds">XYZ-I Point Clouds</h3><p>Now, this is just an example, but let me show you something else...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1510" height="862" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-14.35.46--1-.jpg 1510w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Intensity is computed by almost every modern LiDARs and can help process the cloud</span></figcaption></figure><p>This is another point cloud, but what do you notice about the colors? Yes, two things:</p><ul><li>It&apos;s all &quot;RED&quot;</li><li>But not all points have exactly the same &quot;red&quot; value. Some are brighter than others</li></ul><p>And this is because here, we are no longer visualizing the distance, but the &quot;intensity&quot; of the points. Point Clouds are often produced by LiDARs that send a ray and measure the time it takes to bounce back. This calculation measures the distance, but not all rays come back equal. Some are blocked by trees, leafs, or surfaces, while others perfectly go through.</p><p>So, we now know another attribute of a point cloud:</p><ul><li>Point clouds can contain the XYZ information</li><li>Point clouds can also hold the intensity information!</li></ul><p>Any other?</p><h3 id="xyz-v-point-clouds">XYZ-V Point Clouds</h3><p>Now, let&apos;s take it one step further, and look at this video:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ezgif.com-resize--1-.webp" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">XYZ-Velocity point clouds are usually produced by FMCW LiDARs</span></figcaption></figure><p><strong>Okay, can you explain what is happening here? </strong>Everything is grey, but the vehicles. So, is that... Class? Labels? Or, wait a minute, why are the forward vehicles in red, the parked cars in grey, and the left approaching vehicles in blue? This is because, this visualization shows not the class but the velocity information!</p><p>This video has been made from <a href="https://www.aeva.com" rel="noreferrer">Aeva</a>, an <a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/" rel="noopener noreferrer">FMCW LiDAR</a> producer &#x2014;&#xA0;and inside, you can see the points receding are in red, and those approaching are in blue. We now know a third possibility!</p><ul><li>Point Clouds can contain XYZ</li><li>Or XYZ-Intensity</li><li>Or XYZ-Velocity</li></ul><h3 id="lets-see-9-types-of-point-clouds">Let&apos;s see 9 types of Point Clouds</h3><p>Are there any more than intensity or velocity? Yes, in fact - each point can contain a lot of information. Let&apos;s see:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/point-cloud-visualization.001.jpeg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/point-cloud-visualization.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/point-cloud-visualization.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/point-cloud-visualization.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/point-cloud-visualization.001.jpeg 1920w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">How many of these did you know?</span></figcaption></figure><ul><li><strong>Intensity</strong> - how strong the point clouds return signals are</li><li><strong>Range</strong> - the distance of the point, based on X, Y, or Z</li><li><strong>Color</strong> - the RGB color of the points (often for RGB-D cameras or 3D reconstruction)</li><li><strong>Class/Label </strong>- if it&apos;s after an object detector or segmentation tool processed it</li><li><strong>Infrared</strong> - the wavelength of the point cloud signal</li><li><strong>Ring/Channel</strong> - which channel of 3D sensors was used to collect it</li><li><strong>Velocity</strong> - the speed of each point (calculated by RADARs or FMCW LiDARs)</li><li><strong>Reflectivity</strong> - how reflective the surface of the point is</li><li><strong>Temperature</strong> - how hot a point is</li></ul><p>Okay, but concretely, how does it work? Is there a TXT file where we store the points? Kinda, let&apos;s take a look...</p><h3 id="point-cloud-formats-files">Point Cloud Formats &amp; Files</h3><p>There are usually two types of files: ASCII and Binary. One is easier to read, the other is more suited to real-time/embedded. Take a look at the beginning of both files:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1726" height="730" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1600w, https://www.thinkautonomous.ai/blog/content/images/2023/03/f566c5c6-da39-4ff7-8587-6f271a8bd981.png 1726w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Two types of files: ASCII and Binary</span></figcaption></figure><p><strong>See? On the left, the PLY file contains X, Y, Z as floats, followed by a list of point coordinates.</strong> This is the point cloud! On the right, you can see a header describing the point cloud, in format XYZ-Intensity, and then, the points are not readable.</p><h2 id="2-main-ways-to-create-a-point-cloud">2 Main Ways to create a point cloud</h2><p><strong>There are basically 2 types of approaches, <u>active</u> and <u>passive</u></strong>. Active techniques actively emit signals like light or sound to measure distances and create point clouds, such as LiDAR and structured light systems. In contrast, passive techniques rely on capturing existing environmental data, like photogrammetry, which reconstructs 3D points from multiple camera images without emitting any signals.</p><h3 id="active-techniques-lidars-rgb-d-radars">Active Techniques: LiDARs, RGB-D &amp; RADARs</h3><p>In the first case, point clouds come from sensors built to create them. When a camera takes a picture, it aims to get pixels. Well, when a LiDAR makes a measurement, its aim is to create a point cloud. Let&apos;s see 3 ways to do it:</p><h4 id="1-how-to-get-point-clouds-using-structured-light-rgb-d-systems">1) How to get point clouds using Structured Light RGB-D systems</h4><p><strong>Ever played the Microsoft Kinect? I can&apos;t say that I have. </strong>I was a Wii player all the way when they were competing. Yet, I&apos;ve always been impressed by how the Kinect produced point clouds using its RGB-D camera, working with the <strong><u>Structured Light Principle.</u></strong></p><p><strong>The Kinect shines a special pattern of light around,</strong> then uses an infrared camera to take a picture of how that light bounces back. By seeing how the pattern changes, the camera can figure out how far away things are. It combines this distance information with the colors it sees to create a 3D image called one final point cloud.</p><p>In robotics, you probably know the Intel Realsense 435i, or other equivalents. Their goal is to build a Depth Map, then turned into a point cloud.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1134" height="744" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-29-at-15.49.43--1-.jpg 1134w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">What an RGB-D camera produces</span></figcaption></figure><h4 id="2-how-lidar-point-clouds-are-produced">2) How LiDAR point clouds are produced</h4><p><strong>The most common and popular technique is to use a LiDAR (Light Detection And Ranging). </strong>There are many types of LiDARs around, but let&apos;s focus on the simple <u>Time-Of-Flight principle</u>. In this setup, a laser scanner sends a light beam and measure the time it takes to reflect and come back to the receiver. Similar to this image:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1536" height="1024" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/ChatGPT-Image-29-sept.-2025--17_14_09--1-.jpg 1536w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR scanners send a wave and measure the time it takes to come back</span></figcaption></figure><p><strong>LiDAR scanners produce raw data of the world up to 300-400 meters in the automotive industry.</strong> Each scan can generate millions of points in the three dimensional space. I highly recommend checking my article on the <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">types of LiDARs</a> to learn more.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">WAIT</strong></b>! This blog post doesn&apos;t have to be the only thing you read from me. I post daily through<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> my daily emails</a>, and I talk about LiDARs, Computer Vision, and more cutting-edge AI Applications.<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> You can join my emails here.</a></div></div><h4 id="3-radar-point-clouds">3) RADAR Point Clouds</h4><p>The third technique is to use not a LiDAR but a <a href="https://www.thinkautonomous.ai/blog/how-radars-work/" rel="noopener noreferrer">RADAR</a> to create the point cloud data. This is not very straightforward to do. RADARs usually return signal information based on Doppler (velocity), Range (distance), and Azimuth (direction/angle). Using these, we can do some calculations to retrieve the point cloud data.</p><p>Here is an example on a very low quality RADAR:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="586" height="247"><figcaption><span style="white-space: pre-wrap;">How RADAR heatmaps get converted to point clouds</span></figcaption></figure><p>Today, we can use Imaging RADARs to get 3D point clouds. I invite you to <a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noopener noreferrer">check out my Imaging RADAR article to learn more about it.</a></p><p>Now that we&apos;ve seen the Active ways, using sensors - I&apos;d like to take a minute to talk about the passive ways.</p><h3 id="passive-point-clouds-generation-photogrammetry-3d-reconstruction">Passive Point Clouds Generation: Photogrammetry &amp; 3D Reconstruction</h3><p><strong>The idea of passive is that you do not attempt to create a point cloud from your sensors. </strong>The main way to do this is by leveraging 3D Reconstruction. Ideas like Structure From Motion, Multi-View Stereo, NeRFs, Gaussian Splatting, or others are used.</p><p>The idea? To convert 2 or more images to a 3D point cloud using triangulation, geometry, and depth maps.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.jpg" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="1600" height="756" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/unnamed.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/unnamed.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed.jpg 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Stereo Vision is a powerful technique to retrieve 3D models</span></figcaption></figure><p>If you&apos;re interested in this, I highly recommend reading my 3 article Series on <a href="https://pyimagesearch.com/2024/10/14/photogrammetry-explained-from-multi-view-stereo-to-structure-from-motion/" rel="noopener noreferrer">PyImageSearch blog</a>, or my article on Pseudo-LiDARs.</p><p>Alright, so you now know all about the point cloud types, and the ways to get them. One thing remains...</p><h2 id="how-to-process-point-cloud-data">How to Process Point Cloud Data?</h2><p>Do you remember in the point cloud types when I showed the &quot;label/class&quot; of each point? This is not something sensors can measure, it&apos;s built by algorithms. There are 3 things that really matter here:</p><ol><li>Understanding the main libraries/tools to work with</li><li>Understanding the core algorithms to use on raw point cloud data</li><li>Being able to use them in the applications</li></ol><h3 id="libraries-open3d-and-point-cloud-library-pcl">Libraries: Open3D and Point Cloud Library (PCL)</h3><p>There are many libraries used to process point clouds. These implement the algorithms. For example, the Point Cloud Library is one of the most popular to work with. Open3D is also a very common one, it contains fewer algorithms, but is easier to process thanks to the Python interface. I would recommend to get started with this one.</p><p>On a similar topic, you could want to know at least one point cloud dataset. I would recommend you <a href="https://www.thinkautonomous.ai/blog/lidar-datasets/" rel="noopener noreferrer">check out this article</a>.</p><h3 id="which-algorithms-can-be-used-to-process-point-clouds">Which algorithms can be used to process point clouds?</h3><p><strong>In point cloud processing, you can either go with traditional algorithms or 3D Deep Learning.</strong> The split is, I would say, dependent on the applications. When companies want to detect objects in 3D to get bounding boxes, they usually use <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer">3D Deep Learning algorithms</a> like PointPillars or VoxelNet. Let&apos;s see an example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ezgif.com-optimize--1-.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="420" height="336"><figcaption><span style="white-space: pre-wrap;">LiDAR Object Detection - </span><a href="https://courses.thinkautonomous.ai/deep-point-clouds" rel="noreferrer"><span style="white-space: pre-wrap;">taken from my Deep Point Clouds course</span></a></figcaption></figure><p><strong>Outside of 3D Object Detection and 3D Segmentation, the entire world runs on traditional processing approaches</strong>. Since you have points, you can create tons of automated pipelines to process them. For example, you can do plane segmentation, clustering, outlier removal, normal estimation, point data cropping, surface reconstruction, filtering of unwanted data points, and so on, you&apos;ll use traditional approaches.</p><p>For example, you could calculate the surface normals and filter out the objects that belong or don&apos;t below to the street.</p><p><strong>Another technique can involve </strong><a href="https://www.thinkautonomous.ai/blog/point-cloud-registration/" rel="noopener noreferrer"><strong>point cloud registration and alignment</strong></a><strong>.</strong> When you have multiple point clouds, for example coming from 2 LiDARs, you can align them together into a single object. An example below from one of my LiDAR courses, notice how we start with 2 point clouds, a blue and a red, and we end up aligning them perfectly. This makes something better than the raw data.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed--1-.gif" class="kg-image" alt="Point Clouds in Self-Driving Cars: 3 Things Perception Engineers Need to Know" loading="lazy" width="729" height="355" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/unnamed--1-.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/unnamed--1-.gif 729w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">LiDAR Stitching - </span><a href="https://courses.thinkautonomous.ai/point-clouds" rel="noreferrer"><span style="white-space: pre-wrap;">taken from the DLC of my Point Clouds Conqueror course</span></a></figcaption></figure><p>In the algorithm category, there are countless applications. Everything related to SLAM or Odometry is also extremely in use today.</p><h3 id="applications-which-jobs-can-you-target-with-point-cloud-skills">Applications: Which jobs can you target with Point Cloud skills?</h3><p>Regarding the applications, we could write an entire article. Yet, let me give you 3 or 4 core jobs you can target with point clouds processing skills:</p><ul><li><strong>Perception Engineer, Autonomous Vehicles: </strong>Process LiDARs and RADARs to find objects in the 3D space. Use Sensor Fusion to mix the output with Computer Vision. Build autonomous vehicles, shuttles, delivery robots, and create the future.</li><li><strong>Nuclear SLAM Engineer, Robotics</strong>: Use point clouds processing techniques inside robots that explore caves or regions humans can&apos;t go to, such as nuclear sites, and build maps of the world.</li><li><strong>BIM Engineer, Architecture</strong>: Create digital models of buildings and structure by processing raw point cloud data captured from laser scanners or photogrammetry. These models help architects and engineers visualize object properties, plan renovations, and ensure precise construction. The role often involves using processing software to convert points into computer aided design (CAD) models, performing manual correction to refine the data, and integrating the results into architectural workflows for improved design and quality inspection.</li><li><strong>Medical Imaging Engineer</strong>: Apply point cloud techniques to CT-Scans, IMRs, and other 3D data types to detect diseases and save lives. There is both a commercial and research use.</li><li><strong>Drone Engineer, Agriculture</strong>: Process Cameras and RADARs/LiDAR information to navigate and help drones fasten agriculture and solve population needs.</li><li>and many, many more...</li></ul><p>Alright, now let&apos;s see a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>A point cloud is a series of 2D or 3D points.</strong> A point cloud is to the LiDAR what a pixel is to a camera.<br>(Re-read that one.)</li><li><strong>Each point of a cloud usually contains at least the XYZ information,</strong> but many sensors or technique allow to also get Intensity, Reflectivity, Velocity, Ring/Channel, Color, Temperature, Infrared, and more...</li><li><strong>A point cloud output format is of 2 types: ASCII or Binary.</strong> An ASCII file is more readable for humans, Binary is more readable for robots. Each file is a list of points and their information.</li><li><strong>There are 2 ways to build a point cloud: Active and Passive</strong>. Active techniques involve sensors like LiDARs, RADARs, or RGB-D cameras, while passive techniques use photogrammetry and 3D reconstruction to retrieve 3D models.</li><li><strong>Point Clouds Processing typically involves 3 stages</strong>: the tools/libraries, the algorithms, and the applications. Tools are libraries like Open3D or PCL, algorithms are either traditional or deep learning, and applications go from self-driving cars to robotics, drones, augmented reality, the architecture industry, agriculture, and beyond.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text">If you want to learn more about point clouds, I highly recommend you read my other posts, and <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">join my daily emails</a>, where I often talk about LiDARs, Computer Vision, and more cutting-edge AI Applications.<a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer"> You can read them here.</a></div></div>]]></content:encoded></item><item><title><![CDATA[Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?]]></title><description><![CDATA[Tesla vs Waymo: Is this worth making another comparison? Well, I think they are not really comparable, yes, one of them has a better map to Level 5, and if you'd like my expert opinion on who, I invite you to read!]]></description><link>https://www.thinkautonomous.ai/blog/tesla-vs-waymo-two-opposite-visions/</link><guid isPermaLink="false">62a25f550f1a5e26a580b870</guid><category><![CDATA[startups]]></category><category><![CDATA[self-driving cars]]></category><category><![CDATA[tesla]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 10 Sep 2025 15:57:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-1.jpg" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?"><p><strong>Where do you think humans come from?</strong> Growing up, I wrestled with two conflicting ideas about it. One teacher taught me Darwin&#x2019;s theory of evolution: a gradual process of adaptation, rooted in <u>science</u> but riddled with gaps and errors. The other taught me the Bible&#x2019;s Old Testament: the story of divine creation, in which even though nothing was ever proven false, this isn&apos;t built on any &quot;proof&quot;.</p><p><strong>Since then, I realized they were trying to answer the same question</strong>, but operated in entirely <u>different realms</u>, each with its own logic and purpose. One belonged to science, the other to faith. Making head-to-head comparison was almost meaningless.</p><p><strong>And I think this contradiction also exists in self-driving cars,</strong> especially when opposing 2 giants: Tesla and Waymo. Both seem to chase the same prize of &quot;Level 5&quot; autonomy, but when you look closer, their paths are so distinct they&#x2019;re barely comparable.</p><p><strong>In this article, we&apos;re going to try and understand who has what I call the best &quot;<em>Map to Level 5</em>&quot;</strong>, we&apos;ll take a side-by-side comparison, and I&apos;ll give my opinion on 3 aspects:</p><ul><li>The sensor suite</li><li>The algorithms</li><li>The &quot;map&quot;, meaning strategy, vision, and more...</li></ul><p>Let&apos;s begin with the sensors...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">People often compare Tesla and Waymo using irrelevant criteria such as LiDAR vs camera. I have come up with a comprehensive comparison video using research papers and more generally algorithms. Interested? Click <a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-tesla-vs-waymo-algorithmic-view" rel="noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a>! (In case you do not have an account yet, you can sign up for one or visit <a href="https://www.thinkautonomous.ai/sdc-app">https://www.thinkautonomous.ai/sdc-app</a>)</div></div><h2 id="tesla-vs-waymo-who-has-the-best-sensor-suite">Tesla vs Waymo: Who has the best Sensor Suite?</h2><p><strong>Back when I was studying driverless cars,</strong> it was around 2017, when I heard an interview with Sebastian Thrun, godfather of self-driving cars, talking about who was ahead in the race. I vividly remember his words: &quot;<em>Nissan is doing pretty good, but I think the company who is ahead of everyone is actually Tesla</em>&quot;.</p><p><strong>It was 2017, and I remember feeling surprised by this comment</strong>, because at the time, Tesla only had a light ADAS feature working with mobileye, and companies like Waymo, Mercedes, Nissan, and others seemed to be covered everywhere in the media, have &quot;real&quot; self-driving car abilities, and more potential.</p><p><strong>What about today? </strong>Who is ahead? Closer to remove human drivers? Who has the better vision? The better algorithms? The better sensors? Is it Waymo, or Tesla... or someone else?</p><p><strong>In this first part, I want to answer it from a sensor angle</strong>. And to do so, I&apos;m going to start by a screenshot of a very popular X (tweet?) from Elon Musk about RADARs, LiDARs, and cameras from August 2025.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1406" height="808" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-10.20.27-1.jpg 1406w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Elon Musk&apos;s comment on X (</span><a href="https://x.com/elonmusk/status/1959831831668228450" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>This comment raised an army of furious engineers and fusion experts</strong>, mentioning Kalman Filters, and Redundancy, and <a href="https://www.thinkautonomous.ai/blog/9-types-of-sensor-fusion-algorithms/" rel="noopener noreferrer">Sensor Fusion</a>. So before we dive into the exactness of this comment, I would like to describe what each company is doing...</p><h3 id="waymo-29-cameras-6-radars-5-lidars">Waymo: 29 Cameras, 6 RADARs, 5 LiDARs</h3><p><strong>If you look at a Waymo car, you&apos;re going to see exactly the opposite of Tesla: tons of sensors all over the place</strong>. There are RADAR sensors on the front, side and rear, there&apos;s 29 cameras, and 5 LiDARs. The question we can ask is... &quot;Is Waymo trying to kill a fly with a bazooka?&quot;.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-sensors.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1008" height="567" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/tesla-vs-waymo-sensors.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/tesla-vs-waymo-sensors.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/tesla-vs-waymo-sensors.jpg 1008w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s sensor stack</span></figcaption></figure><p><strong>Just in terms of calibration, it must be an absolute <u>nightmare</u> for engineers.</strong> Calibrating a camera with a LiDAR is already a long task, but 29 cameras with 5 LiDARs? Then there is the fusion of all of these sensors together, and this is ONLY the &quot;robotaxi&quot; version, because if you look at their Zeekr shuttles, they also have their own types of sensors, with different generation codes. Their stack is therefore always evolving, and depends on the vehicle they drive on.</p><h4 id="what-type-of-lidar-camera-and-radar-is-waymo-using">What type of LiDAR, Camera, and RADAR is Waymo using?</h4><p>Let&apos;s take a brief look at what each sensor sees:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="640" height="226" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.29.22-ezgif.com-optimize.gif 640w"><figcaption><span style="white-space: pre-wrap;">What Waymo&apos;s sensors see (</span><a href="https://waymo.com/" rel="noreferrer"><span style="white-space: pre-wrap;">Waymo</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Waymo uses different types of cameras. </strong>They work with high-res long-range cameras (color, telephoto lenses) for object detection far down the road, wide-angle cameras for close-range coverage (pedestrians, cyclists, intersections), and near-infrared cameras for night vision / low-light perception.</p><p><strong>Regarding LiDARs, they&apos;re using their own sensors called &quot;Laser Bear Honeycomb&quot;</strong>. There is one forward LiDAR, 2 side LiDARs, and one at the rear. But these are short range, solid-state LiDARs. They are excellent for blind spots and front facing vehicles, but complex to drive on highways because they don&apos;t see far. This is why there is the roof LiDAR, which is mechanical, and sees several hundred meters away. </p><p>On the animation below, you can see LiDARs both in point clouds format and in range-view &#x2014;&#xA0;and  <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noreferrer">you can learn about types of LiDARs here</a>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="640" height="360" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/waymoslidar-ezgif.com-optimize--1--1.gif 640w"><figcaption><span style="white-space: pre-wrap;">How Waymo&apos;s sensor complement eachother (source: </span><a href="https://portal.thinkautonomous.ai/self-driving-cars" rel="noreferrer"><span style="white-space: pre-wrap;">THE SELF-DRIVING CAR ENGINEER SYSTEM</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Regarding RADARs, Waymo uses their own line of </strong><a href="https://www.thinkautonomous.ai/blog/imaging-radar/" rel="noreferrer"><strong>Imaging RADARs</strong></a>. Imaging RADARs are what I call <a href="https://www.thinkautonomous.ai/blog/fmcw-lidars-vs-imaging-radars/" rel="noreferrer">4D RADARs</a>. Unlike normal RADARs who see in 2D and measure the velocity, these ones see in 3D and measure the velocity.</p><p>Do you want to see what all of this looks like together? Okay, here it is:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/ScreenRecording2025-09-10at14.39.44-ezgif.com-optimize.gif" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="480" height="272"><figcaption><span style="white-space: pre-wrap;">The orange represents the point clouds and detections &#x2014; the blue represents the Imaging RADAR signatures &#x2014;&#xA0;the cameras are at the bottom row</span></figcaption></figure><p>Waymo uses a powerful array of sensors allowing them to see every possible object. We&apos;ll come back to the utility of LiDARs and RADARs, but for now, let&apos;s look at cameras...</p><h3 id="2-teslas-sensor-design-8-cameras-thats-it">2. Tesla&apos;s Sensor Design: 8 Cameras, that&apos;s it </h3><p>Unlike Waymo vehicles, Tesla&apos;s approach relies only on cameras. Tesla&apos;s autopilot aims to solve autonomous driving using vision-only.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1614" height="750" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.07.29--1--1.jpg 1614w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s sensor stack used 8 cameras (red) and 12 ultrasonics (orange)</span></figcaption></figure><p><strong>On this illustration, you can see a very simple design that almost never changed. </strong>In red, you can count 8 outside cameras, and in orange, you can see 12 ultrasonic sensors used to detect static objects when parking. Among the 8 cameras, there 2 on the windshield, used for stereo vision, one on the front bumper, one on the rear bumper, 2 on the doors, and 2 on the wheels. See the difference? I can&apos;t even begin to count Waymo&apos;s cameras, but I can easily show you Tesla&apos;s sensor stack.</p><p><strong>So let&apos;s visualize what the cameras see:</strong></p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1560" height="1110" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.15.53--1--1.jpg 1560w" sizes="(min-width: 720px) 720px"></figure><p>Interesting, or not. Now, let&apos;s try and understand, who has the best sensor suite?</p><h3 id="3-who-has-the-best-sensor-suite-tesla-or-waymo">3. Who has the best sensor suite... Tesla or Waymo?</h3><p><strong>I am going to show you an image</strong>, and I would like you to ONLY look at the left part. Ignore the right for now. Can you tell me what you see? You see a car, don&apos;t you?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="2000" height="639" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.41.15.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Seeing under fog or difficult conditions is the limiting point of a camera only architecture, isn&apos;t it?</span></figcaption></figure><p><strong>But did you notice the pedestrian?</strong> This is one of the limits of the vision-only approach. When you look at Tesla&apos;s miles driven without disengagement reports, Tesla FSD clearly shows a limit on bad weather. They don&apos;t drive well on cloudy foggy, rainy, or snowy scenes, and they don&apos;t drive at all during storm and sleet (when ice falls from the sky).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1048" height="620" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-14.57.04.jpg 1048w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla can&apos;t drive autonomously in regions like snow, storm, sleet, fog, and even heavy rains (</span><a href="https://teslafsdtracker.com/Main" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>Reports show that FSD13 significantly improved driving by night,</strong> but there is still a <u>physical</u> limitation to driving with cameras only that is not solvable with better algorithms. In a robotaxi situation in which you&apos;d sit in the passenger seat, you would be stuck. The way humans drive involves more than cameras, we hear street sound, we sense people, and we don&apos;t use wide-angle cameras to detect other cars.</p><p><strong>If we were to come back to Elon Musk&apos;s comment now, do you remember the &quot;If LiDARs/RADARs disagree with cameras, which one wins???&quot;</strong>. Waymo shows that redundancy is key to safety. If a camera misses something, your RADAR may not. And with Kalman Filters, you can certainly develop a powerful fusion module to account for disagreements.</p><p><strong>When we consider the physical ability of a LiDAR to generate point clouds, you can understand how powerful having them is</strong>. Other than seeing through night or other situations, they physically build <a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noreferrer">point clouds</a>. Recently, a YouTube video has shown a Tesla vs LiDAR-equipped car driving on a wall resembling a street. The Tesla FSD crashed on the wall, confusing it with the highway; but the LiDAR-equipped car stopped.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1872" height="802" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.19.51--1-.jpg 1872w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla vs LiDAR (</span><a href="https://www.youtube.com/watch?v=IQJL3htsDyQ" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>You may wonder... Okay, but we&apos;ll never see fake walls in real life, so what&apos;s the point? The point is that vision only has its limitations. We&apos;ve seen Tesla confuse the moon with objects, miss a red light, do phantom breaks, and even miss truck trailers. <strong>For all of these reasons, I would say that the Waymo approach including Camera+LiDAR+Imaging RADARs is a better choice. They get the #1 point.</strong></p><p>One caveat is that the algorithms, processing power, and energy required to operate this vehicle is insane. LiDAR sensors consume a lot of energy and record a lot of data. Tesla is much cleaner in that perspective.</p><p>Speaking of algorithms, let&apos;s now move to this second point.</p><h2 id="tesla-vs-waymo-who-has-the-best-algorithms">Tesla vs Waymo: Who has the best algorithms?</h2><p>If we now look at the algorithms, who is closer to build self-driving cars? Tesla and Waymo started off with very different architectures, but now seem to converge towards End-To-End Learning. So let&apos;s take a look...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">Hey, I have to make a confession: I couldn&#x2019;t dive deeper into the algorithm side here, but I recorded a full comparison of Tesla and Waymo&#x2019;s architectures. You can get access <a href="https://www.thinkautonomous.ai/blog/de7911199f91436b9edad2fc7bdce9b9?pvs=25" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">here</strong></b></a>.</div></div><h3 id="1-tesla-fsd-algorithms-hydranets-occupancy-networks-and-end-to-end-learning">1. Tesla FSD Algorithms: HydraNets, Occupancy Networks, and End-To-End Learning</h3><p>In my <a href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/" rel="noreferrer">article breakdown on Tesla</a>, I&apos;m doing a full deep dive on the Tesla&apos;s algorithm, and how they work; so I won&apos;t do that here, but I will still show you the overview of how they built their <a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer">autonomous vehicle architecture</a>. Note that it&apos;s according to the Tesla data; which moved to private in 2023.</p><p>You can see 3 main blocks:</p><ul><li><strong>Lane &amp; Object HydraNet: </strong>The lane and object <a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noreferrer">Hydranet</a> is a multi-task learning network that takes in the 8 cameras, learns features from each using a CNN, fuses them spatially and temporally via a Vision Transformer, and then outputs several heads. Heads are trained to detect objects, lanes, positions, and so on... You can read more details here.</li><li><strong>Occupancy Network</strong>: The <a href="https://www.thinkautonomous.ai/blog/occupancy-networks/" rel="noreferrer">Occupancy Network</a> is also processing all 8 cameras spatially and temporally, except that it&apos;s trained to leverage spatial data. This is a 3D network that aims to build voxels and assign a free/occupied state to each. You can read more details here.</li><li><strong>Planning &amp; Control</strong>: The Planning &amp; Control node used to be (in the drawing) done via a Monte-Carlo Tree Search. This is traditional artificial intelligence. In 2024, they replaced this with a Neural Network planner. While we don&apos;t have details on how it works, the &quot;End-To-End&quot; comes from this node moving to Deep Learning, making the entire network differentiable.</li></ul><p>To push the explanation even further, let me show you the typical visualizers on a Tesla, and see how they both refer to an algorithm:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1528" height="806" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.45.47--1--2.jpg 1528w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s algorithm visualizations</span></figcaption></figure><p>Now, let&apos;s see Waymo...</p><h3 id="2-waymos-algorithms-3d-deep-learning-diffusion-planners-and-more">2. Waymo&apos;s Algorithms: 3D Deep Learning, Diffusion Planners, and more...</h3><p>It&apos;s a bit harder to fully track the state of Waymo&apos;s algorithms, because they continuously release <a href="https://waymo.com/research/" rel="noopener noreferrer">multiple research papers</a>, and we don&apos;t know which ones are in production, and which are just pure research. Still, according to my research, there are 3 core pillars Waymo relies on to drive...</p><ul><li>LiDARs</li><li>Prediction/Tracking</li><li>Imitation/End-To-End</li></ul><h4 id="lidars">LiDARs</h4><p><strong>Early on, Waymo pioneered work on LiDARs with 3D Object Detection algorithms like SW-Former</strong>. These algorithms process LiDAR point clouds and output bounding boxes in 3D. This architecture has been updated a few times, but now serves as the &quot;core&quot; detection algorithm of Waymo. From there, it&apos;s encapsulated into other pipelines, like the Late-To-Early Fusion:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1724" height="784" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-03-at-11.58.31--1--1.jpg 1724w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s </span><a href="https://waymo.com/research/swformer-sparse-window-transformer-for-3d-object-detection-in-point-clouds/" rel="noreferrer"><span style="white-space: pre-wrap;">SW-Former</span></a><span style="white-space: pre-wrap;"> &amp; </span><a href="https://waymo.com/research/lef-late-to-early-temporal-fusion-for-lidar-3d-object-detection/" rel="noreferrer"><span style="white-space: pre-wrap;">Late-To-Early Fusion</span></a><span style="white-space: pre-wrap;"> algorithms</span></figcaption></figure><p><strong>See what&apos;s happening?</strong> We have a temporal fusion algorithm that processes LiDARs and boxes from t-1, t-2, and so on... and each of these go to SWFormer to output a final head. This is turning SWFormer into a temporal detector, and not just a frame-by-frame detector.</p><h4 id="prediction-tracking">Prediction &amp; Tracking</h4><p><strong>Going even further, we have Prediction &amp; Tracking. </strong>Waymo bets big on tracking, and one of the core algorithms I noticed there is an architecture recently released called Stateful Track Transformer which is doing exactly the job of tracking from SWFormer. Over the years, Waymo released TONS of prediction and tracking architectures.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1594" height="644" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/b94de5c6-8064-4b43-bd37-8ba9aa294877.jpeg 1594w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s Stateful Track Transformer (</span><a href="https://waymo.com/research/stt-stateful-tracking-with-transformers-for-autonomous-driving/" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><h4 id="end-to-endimitation">End-To-End/Imitation</h4><p>Back in the early 2020s, I remember vividly Waymo mentioning an algorithm called ChauffeurNet, who was behaving exactly like Tesla&apos;s HydraNet, but was outputting trajectories. Since then, the approach evolved, and the later published papers and public talks mention the End-To-End architecture named EMMA, as well as Vision Language Models.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="2000" height="731" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-15.30.42--1-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s Encoder/Decoder Architecture (</span><a href="https://io.google/2025/explore/pa-keynote-22" rel="noreferrer"><span style="white-space: pre-wrap;">Google I/O 2025</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>As you can see, this turned in to an Encoder/Decoder architecture,</strong> where the encoder learns features from each sensors, then fuses spatially and temporally, to learn a compressed representation of the scenes. Then, the decoder is a generative part, built on VLMs to predict a trajectory.</p><p><strong>While many claimed Waymo has a &quot;modular&quot; approach,</strong> <strong>while Tesla has the advanced End-To-End approach; this is simply no longer the case.</strong> Although we don&apos;t know whether Waymo uses EMMA in production, or still relies on their traditional pipeline, we definitely know they&apos;re heading towards it.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Are you enjoying this part? I am doing a full coverage of all these algorithms </strong></b>&#x2014;&#xA0;via a 1h Tesla Masterclass &#x2014; and a detailed algorithmic comparison of Tesla &amp; Waymo in my platform.<br><br>It&apos;s reserved to my daily email readers, and if you&apos;d like to join us: <a href="https://www.thinkautonomous.ai/sdc-app" target="_blank" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">You can sign up here for free and get the deep dives</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h3 id="3-who-has-the-better-algorithms-waymo-or-tesla">3. Who has the better algorithms? Waymo or Tesla?</h3><p>Let me write 4 or 5 bullet points explaining what I think:</p><ul><li><strong>Both Tesla and Waymo seem to be headed towards End-To-End Learning, because the Modular approach has the car behave in a &quot;robotic&quot; fashion. </strong>This is not smooth, feels robotic, and rule based. So both go towards End-To-End...</li><li><strong>But to make End-To-End work well, you need LOTS of data</strong>. Tesla has that from their fleet of millions of cars all across the world, but Waymo has a small fleet driving only in very specific regions. Scaling via End-To-End will be extremely painful for them.</li><li><strong>On top of that, Tesla only needs to process camera data</strong>, which makes the algorithms likely faster, using less power and less time consuming.</li><li><strong>End-To-End&apos;s biggest problem is edge cases</strong>. Tesla has experience driving several million miles in construction zones, parking lots, or in a new city, and so on... Waymo, on the other hand, is stuck with HD Maps and can have issues moving towards End-To-End...</li><li><strong>Tesla also has advanced techniques for Edge Case Detection and retraining,</strong> such as Trigger Classifiers (<a href="https://www.thinkautonomous.ai/blog/automotive-data-processing/" rel="noreferrer">see my detailed overview here</a>), as well as Dojo and Self-Supervised Learning. They seem far more prepared for End-To-End Learning in my perspective than Waymo. It&apos;s as if Tesla had paved the way for this for a decade, while Waymo pivoted last minute</li></ul><p>Given these, and under the assumption that the price here is End-To-End, I would give my points to Tesla.</p><h2 id="waymo-vs-tesla-who-has-the-better-map-to-level-5">Waymo vs Tesla: Who has the better Map to Level 5?</h2><p>In this last point, I would like to get off the sensors and technique, and see these as businesses primarily. Their goal is to sell self-driving cars, or autonomous rides, and thus... who&apos;s leading in that sense?</p><h3 id="1-strategy">1. Strategy</h3><p>First of all, something very important to understand:</p><ul><li>Tesla sells self-driving cars</li><li>Waymo rents autonomous transportation services</li></ul><p><strong>This is essential, because this means their software, sensor stack</strong>, philosophies, business models, and even strategies to reach Level 5 are totally opposed. This is very clear when you see the graph below, which shows Waymo starting with a very capable vehicle, but only in ONE geo-fenced area, with ONE car, while Tesla starts with millions of car, but none of them are autonomous. Both don&apos;t scale the same thing:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1392" height="870" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-11.59.59-1.jpg 1392w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The strategy of Tesla and Waymo are not the same</span></figcaption></figure><p><strong>Tesla&apos;s goal is to make millions of cars at a 25,000$ price point</strong>. Imagine Tesla in 2016, being told to integrate LiDAR technology, which cost over 50k USD/unit. Wouldn&apos;t they be better off starting with a light camera + RADAR Level 2, and gradually improving it? Of course they would, if they believe it&apos;s possible.</p><p><strong>On the other hand, Waymo had a fleet of maybe 20 vehicles at the time.</strong> With 29 cameras, 5 RADARs, and 6 LiDARs, a Waymo car costs significantly more than a Tesla car. In fact, each car was estimated to cost around 250,000$ a few years ago. Since then, the LiDAR price dropped, Waymo grew its fleet to over 2,000 cars, and each is now estimated around 150,000$. Can you see how they cost is less and less of a problem to them over time?</p><p><strong>Waymo&apos;s LiDARs certainly cost a lot, but this cost get absorbed as they do more paid rides</strong>, <strong>until it &apos;supposedly&apos; becomes profitable.</strong> Supposedly, because there is maintenance, replacement, and growth of the fleet, which can make the cost a never ending fight. Even there, Waymo has more leverage to afford to LiDARs, by raising the ride cost of millions of people from 7$ to maybe 13$ (made up numbers, terribly off &#x2014;&#xA0;just making a point).</p><p><strong>There are great statistics on </strong><a href="https://www.01core.com/p/driverless-car-costs-have-gotten" rel="noopener noreferrer"><strong>this blog post</strong></a><strong> from Ben Buchanan</strong> that show how Waymo&apos;s car get more and more affordable over time. A LiDAR costs 500-1,000$ today, this completely challenges the vision-only philosophy. Unlike Tesla, time in on Waymo&apos;s side.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1124" height="676" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.45.58.jpg 1124w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo has time on their end. The more rides they make, the more they get their investment back</span></figcaption></figure><p><strong>You can therefore understand how each one wins</strong>:</p><ul><li><strong>Tesla starts with TONS of disengagements</strong> and tries to decreases that number to 0. <u>The more capable the algorithm, the more cars the will sell.</u></li><li><strong>Waymo starts with very few disengagements </strong>and tries to scale the number of rides and regions without increasing this number. <u>The more regions they will cover, the more rides they will sell.</u></li></ul><h3 id="2-hd-maps"><strong>2. HD Maps</strong></h3><p><strong>Waymo is betting on a serious &quot;HD Map&quot; strategy that Tesla refused to adopt</strong>. According to Tesla, the car should be able to drive anywhere in the US, so there&apos;s only using normal Google Maps or OpenStreetMap. It does not mean they don&apos;t use HD Maps; they do (see screenshot below), but they don&apos;t <u>require</u> them to drive. If the car ends on a parking lot with no map, it should still be able to drive.</p><p><strong>On the other hand, Waymo maps every squared inch of every place they drive in. </strong>This means every traffic sign, every bumper, every crossroad, every traffic light, every roadwork, lane lines, speed limit... <u>everything</u> is continuously mapped and updated. You can see on the image the HD Maps of Waymo and a screenshot of Tesla&apos;s HD Maps on a car in debug.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1434" height="620" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-12.22.53--1-.jpg 1434w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla vs Waymo&apos;s HD Map Game</span></figcaption></figure><p>From what I read, Tesla performs poorly on regions where they don&apos;t know the maps. While this doesn&apos;t block them, this certainly causes disengagements. They are therefore equal in this sense.</p><h3 id="3-miles-driven-disengagements">3. Miles Driven &amp; Disengagements</h3><p><strong>We can&apos;t conclude this article without first looking at who slams the brakes the most</strong>. Yet, I feel you already know the answer, from the first two points. It&apos;s obviously Tesla, because they have many more cars, miles driven (5B for Tesla vs 100M for Waymo), and different situations. So how can we really vote who has the better map to Level 5?</p><p><strong>Let&apos;s first look at disengagements. </strong>What is a disengagement? Is overtaking a stuck vehicle a disengagement? Is accelerating? What is the definition, and do Tesla and Waymo use the same? Well, Waymo uses safety drivers, who obey specific instructions. Tesla drivers disengage for virtually any reason, even if they simply feel like it. This is why <a href="https://teslafsdtracker.com/Main" rel="noopener noreferrer">Tesla FSD Tracker</a> shows, as of September 25, <u>24% of FSD drives have a disengagement, and 3% have critical disengagements (US).</u></p><p><strong>Continuing with more stats:</strong> Tesla drives on average 213 miles before a disengagement on a highway, and most disengagements are caused by Lane Issues. Notice the jump in miles driven without a disengagement with FSD <strong>12.6</strong>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg" class="kg-image" alt="Waymo vs Tesla: Who is closer to Level 5 Autonomous Driving?" loading="lazy" width="1548" height="676" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/09/Screenshot-2025-09-10-at-13.00.53.jpg 1548w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Disengagement Reports of Tesla (</span><a href="https://teslafsdtracker.com/Main" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>For Waymo, it&apos;s a different story</strong>. Waymo has a driver permit in California, which had them recently release the <a href="https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/disengagement-reports/" rel="noopener noreferrer">2024 disengagement report</a> to California DMV. It clearly showed a number of 9,793 miles driven before disengagement. You may ask... How is Tesla at 200 but Waymo at 9,700 miles driven without disengagement? It&apos;s because, as we said, the definition, reason for disengagement, and cities, are extremely different.</p><p>If a company wants to drive on a straight line for 500 million miles and show 0 disengagement, they technically can. This is why I think, we can&apos;t really trust any of these numbers &#x2014;&#xA0;only their relative evolution on the same conditions as before &#x1F937;&#x1F3FB;&#x200D;&#x2642;&#xFE0F;</p><p><strong>So who has the better Map to Level 5?</strong> I will tell you, but only after we have made a quick summary of what we&apos;ve seen...</p><h2 id="summary-who-is-ahead-waymo-or-tesla">Summary: Who is ahead, Waymo or Tesla?</h2><ul><li><strong>Tesla and Waymo both have different sensor stacks</strong>. While Waymo relies on a stack of 29 cameras, 6 radars, and 5 LiDARs, Tesla takes a different route and relies on 8 cameras only.</li><li><strong>A sensor setup with LiDARs and RADAR redundancy is safer and gives more reliability than a vision-only setup. </strong>LiDARs can detect objects and brake the car even if no object is identified by the camera or an algorithm. Waymo also drives in more weather conditions than Tesla, who is physically limited by the lack of other sensors.</li><li><strong>Waymo uses an architecture based on LiDARs,</strong> with algorithms like SW-Former, Prediction/Tracking, and EMMA as an End-To-End system.</li><li><strong>Tesla uses an End-To-End approach</strong> involving HydraNets, Occupancy Networks, and Deep Planning. This approach is reinforced by powerful Trigger classifiers, Self-Supervised Learning Dojo, a powerful data fleet &#x2014;&#xA0;making them win on the algorithm aspect.</li><li><strong>Waymo depends heavily on detailed HD Maps and can only work on these mapped, geo-fenced areas</strong>. Tesla&#x2019;s system is designed to drive anywhere without necessarily relying on HD maps. In practice, they drive much better when they have maps.</li><li><strong>Waymo boasts a much better disengagement rate, </strong>with safety drivers rarely needing to take control compared to Tesla&#x2019;s more frequent interventions; but the conditions they drive in and disengage are 100% under <u>their</u> control.</li><li><strong>The two companies have very different business models:</strong> Waymo sells autonomous ride services, Tesla sells cars with self-driving features. As a result, their map to level 5 is not the same.</li></ul><p>And now, you&apos;re all caught up. So, the Map to Level 5?</p><p></p><h3 id="the-map-to-level-5">The Map to Level 5</h3><p>Here is what I think:</p><p><strong>To reach Level 5, Tesla will need to <u>reduce</u> the number of disengagements. </strong>To me, they will have to include LiDARs or RADARs at some point. At today&apos;s cost, that would probably be feasible. In fact, I&apos;m pretty sure Tesla is so competent FSD would be solved right now if they didn&apos;t chose to play the game on hard more. But now, after all this time, can they really afford to do this? There&apos;s ego, brand image, and the &quot;FSD capable&quot; computers they already sold. How can they? Tesla is in the camera game for good.<br></p><p><strong>To reach Level 5, Waymo has to <u>increase</u> the number of areas they drive <u>without increasing</u> the disengagement rate</strong>. This means adapting the &quot;HD Map&quot; strategy, which still relies on them, and making algorithms capable to adapt to a new region faster. To make autonomous driving a reality for the entire world, Waymo will need to go faster. Their autonomous vehicles surely are capable, but their current scaling strategy takes too long.</p><h2 id="next-steps">Next Steps</h2><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">I hope you enjoyed this article and it taught you a lot! </strong></b>If you want to go to the next steps, I have a platform for my daily email readers which contains 60 min+ of videos explaining Tesla&apos;s algorithms, and comparing them to Waymo&apos;s architecture. This is a more technical deep dive, and I&apos;m sure you&apos;ll love it. Interested? <a href="https://www.thinkautonomous.ai/sdc-app" target="_blank" rel="noopener noreferrer"><b><strong style="white-space: pre-wrap;">You can sign up here for free and get the deep dives</strong></b></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div>]]></content:encoded></item><item><title><![CDATA[How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)]]></title><description><![CDATA[Self-driving cars collect Tb of videos every day... but is that really needed? (spoiler: No) 

In this article, you'll discover how to collect data in the AV 2.0 age; from Tesla's Trigger Classifiers, to Heex Event Management solutions, learn the different ways to do automotive data processing.]]></description><link>https://www.thinkautonomous.ai/blog/automotive-data-processing/</link><guid isPermaLink="false">685277c9c8f3bf93bd18732c</guid><category><![CDATA[self-driving cars]]></category><category><![CDATA[deep learning]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 29 Jul 2025 08:16:16 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/07/automotive-data-processing.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/07/automotive-data-processing.jpg" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)"><p><strong>Have you ever heard the story of the iPod?</strong> It started in January 2001, right after Apple announced a loss of $195 million, and had missed the shift to digital music. The company was lost, and had one last chance of survival: building an MP3 player to catchup with the competitors.</p><p><strong>If you are old enough to remember the MP3 players back then,</strong> the were&#xA0;confusing to use, overloaded with buttons and menus, and made the experience painful for customers. Apple was looking for a solution for months, but had no clue how to make it better.<br><br><strong>Until one day, when Apple&apos;s Head of Marketing Phil Schiller suggested using a scroll wheel</strong>. Wheels already existed in mouses and dial phones, but had never been never used in music players. With this, he suggested that the menus should scroll faster the longer the wheel is turned,&#xA0;a stroke of genius&#xA0;that would distinguish the iPod from the agony of using competing player.<br><br>The rest is history: Apple developed the iPod in the greatest secrecy, launched it, and changed the world with &quot;1000 songs in your pocket&quot;.<br><br><strong>What made it so successful?</strong>&#xA0;It&apos;s not that it looked good, or had buttons, or could store more songs. No, the genius was in the <strong><u>smarter</u></strong>&#xA0;experience scroll wheel.</p><p><strong>If you&apos;re in the autonomous vehicles market, we probably witnessed a similar pattern:</strong> companies have been collecting more and more data endlessly, building data centers, simulators, hiring people to analyze the data generated, and so on... Until some companies came up with smarter ways, not involving just &quot;collecting more data&quot;, but rethinking the experience to focus on events instead.</p><p>In this article, I would like to tell you about the way automotive data processing works nowadays, and how the AI revolution is going to reshape it.</p><p>We are going to learn about 3 ideas:</p><ol><li><strong>The first part is going to focus on the Manual Era</strong> (where we collect and process it all) and the <strong>Cloud</strong> <strong>Era</strong> (where we use DataLakes)</li><li><strong>The second part will be a case-study provided by an autonomous tech startup, </strong>revealing the 10 biggest problems of the Cloud Era.</li><li><strong>The last part will show you the Edge Intelligence Era &amp; the Autonomous Era</strong>, which, as you&apos;ll see, if an incredibly more intelligent way to do</li></ol><p>Let&apos;s begin with point #1.</p><h2 id="1-data-management-how-self-driving-car-companies-collect-and-process-data-in-the-cloud-era">1. Data Management: How self-driving car companies collect and process data in the Cloud Era</h2><p><strong>One of the things we heard the most this past decade was that Data is king. </strong>And for a long time, collecting as much data as you can in order to train heavy machine learning models has been the only way to do. Let&apos;s talk about data collection, and then processing.</p><h3 id="how-do-autonomous-vehicles-collect-data">How do autonomous vehicles collect data?</h3><p>We know that when a self-driving car drives, all the data (sensors, images, messages, hardware status, algorithm decision, ...) is being recorded.</p><p>Should we give an intro line explaining how?</p><p><strong>The process is simple, and looks like this:</strong></p><ol><li>You <strong>plug</strong> <strong>your</strong> <strong>sensors</strong> to your system (for example, Robotic OS/ROS)</li><li>You<strong> </strong>press<strong> record</strong></li></ol><p><strong>I&apos;m sure somebody out there worked hard to find a more complex process, </strong>but if you&apos;re using a tool like ROS, recording data is as simple as using one command line. In the video below, you can see me recording LiDAR point clouds, camera images, GPS positions, algorithms outputs, and mostly all the messages passed through the self-driving car while we drive...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/road_data-ezgif.com-optimize.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="480" height="260"><figcaption><span style="white-space: pre-wrap;">Visualizing the live sensor streams of a self-driving car</span></figcaption></figure><p>When I&apos;m done recording, the output is a file in the .<strong><em>bag</em></strong> extension (for ROS 1) that can vary from a few Gb to Terrabytes of data. Let me show you an example below from the <a href="https://github.com/TIERS/tiers-lidars-dataset" rel="noreferrer">TIERS</a><a href="https://github.com/TIERS/tiers-lidars-dataset" rel="noreferrer"> dataset</a>. Notice the duration and sizes of the recordings below &#x2014; the last one is just <strong>8 minutes long</strong>, and yet weights <strong>200Gb</strong>. This is 2.4Gb/minute!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="746" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-14.03.21_1.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The size of a single ROS Bag is huge.</span></figcaption></figure><p>The bag is the first element. Then comes what we do with it.</p><h3 id="the-manual-era-how-do-we-process-and-analyze-data">The Manual Era: How do we process and analyze data?</h3><p><strong>The first &quot;era&quot; I&apos;d like to tell you about is the 1.0 era</strong>. Back when I worked on autonomous shuttles, each of our fully autonomous vehicles was driving and collecting data to SSD drives. When the day was over, we had hundreds of Gb to process. So we started coming up with file naming conventions, involving the date, event, and so on...</p><p><strong>Then, when back to the office, we could replay our algorithms on it,</strong> train our models on the data, and so on... Below is an example of a <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer">drivable area segmentation</a> algorithm I&apos;ve been training on the data collected:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/ezgif.com-optimize--11-.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="480" height="270"><figcaption><span style="white-space: pre-wrap;">Example of a &quot;Replay&quot; of a Drivable Area Segmentation Algorithm</span></figcaption></figure><p>This was fine for a small startup of 8 people, and it&apos;s probably still okay for small companies that don&apos;t need extensive processing, but most autonomous vehicle companies have turned to the cloud...</p><h3 id="the-cloud-era-how-advanced-driver-assistance-systems-adas-most-of-the-automotive-industry-is-using-data-lakes">The Cloud Era: How Advanced Driver Assistance Systems (ADAS) &amp; most of the Automotive Industry is using Data Lakes</h3><p><strong>If you record data every day, and each recording is hours long, you&apos;re never going to find the events you need</strong>. This is why I&apos;m showing you a more sophisticated, let&apos;s say &apos;1.5&apos; version, which makes data collection part of a pipeline.</p><p>It looks like this:</p><ol><li>You <strong>record</strong> the data</li><li>You <strong>upload</strong> it to AWS/Azure</li><li>The R&amp;D team then <strong>processes</strong> it weeks later, <strong>replaying</strong> all the events, <strong>searching</strong> for 10% possibly interesting scenarios, or events, and so on...</li></ol><p>If you&apos;d like to see real-world concepts, you can see AWS and Azure Data Lakes:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1870" height="628" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-18-at-12.14.07.jpg 1870w" sizes="(min-width: 1200px) 1200px"><figcaption><a href="https://www.thinkautonomous.ai/blog/medical-image-segmentation/" rel="noreferrer"><span style="white-space: pre-wrap;">AWS Data Lake</span></a><span style="white-space: pre-wrap;"> vs </span><a href="https://learn.microsoft.com/en-us/industry/mobility/architecture/avops-architecture-content" rel="noreferrer"><span style="white-space: pre-wrap;">Azure Data Lakes</span></a></figcaption></figure><p>A lot of companies in the self-driving car market use these &quot;data lakes&quot;. Let&apos;s look at the Azure Data Lake in a simplified view:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/data-lake.001.jpeg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1280" height="720" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/data-lake.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/data-lake.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/data-lake.001.jpeg 1280w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The 4 Horsemen of Data Processing: The Bag Recording is just a tiny step</span></figcaption></figure><p>ADAS &amp; fully autonomous cars use it. After recording, the 4 key blocks are:</p><ol><li><strong>DataOps</strong>: Where we analyze data, clean it, label it, augment it, tag it, and so on... Notice the interaction with external labellers; that idea is called &quot;human-in-the-loop&quot;.</li><li><strong>MLOps</strong>: The machine learning algorithms, training, testing, and so on...</li><li><strong>ValidationOps</strong>: The validation part, involving visualization, scenario, and simulation.</li><li><strong>MetaData</strong>: After the DataOps tagged the data, we can search for it.</li></ol><p>You can see how it&apos;s placing data at an element in the chain.</p><p>So what are the problems of this? Before telling you about the 2.0 Autonomous Era, let&apos;s try to see a case study with real ADAS or artificial intelligence companies using it...</p><h2 id="2-case-study-adas-actors-reveals-their-10-biggest-problems-with-data-driven-approaches">2. [Case Study] ADAS Actors reveals their 10 biggest problems with Data-Driven Approaches</h2><p>In this section, before talking about the &apos;2.0&apos; approach, I would like to tell you about the core problems companies who process large volumes of data reported.</p><p><strong>Before writing this article, I got the opportunity to talk to </strong><a href="https://www.heex.io/en-gb/smarter-data-faster-decisions" rel="noreferrer"><strong>Heex Technologies</strong></a>, a french startup specialized in Event Based Data Management... and I asked them &quot;Which problem do you solve?&quot;. To answer, they shared a 20 page PDF listing all the problems their biggest Advanced Driver Assistance Systems (ADAS), autonomous driving, or robotic clients from the automotive industry.</p><p>In the PDF, I spotted a lot of interesting problems. Let me share the main ones with you:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/heex-data-processing.001.jpeg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/heex-data-processing.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/heex-data-processing.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/heex-data-processing.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/heex-data-processing.001.jpeg 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Case Study reveals tons of time and money issues in classical data processing</span></figcaption></figure><p><strong>If I were to list down the 10 main problems, you&apos;d see:</strong> <u>Slow</u> access to critical events, <u>Manual</u> data processing, <u>Exploding</u> Cloud costs, <u>Fragmented</u> data, <u>Delayed</u> visualization (no real-time), <u>Manual</u> Extraction of scenarios, <u>Useless</u> streaming data, <u>Physical</u> SSD Extraction, <u>Blind</u> Debugging, and <u>Inefficient</u> ROS Bag Processing.</p><p>Notice all these terms I underlined? These are the problems of today&apos;s data management systems.</p><p>Let&apos;s take some examples...</p><ul><li><strong>If you collect the data on Day 1, and process it on Day 3, </strong>you have slow/delayed access to critical events; like a missed pedestrian. So you&apos;re driving, notice something wrong, but you have to wait until 2 days later to even look for the data, and start searching for that event you noticed...</li><li><strong>Similarly, can you see how the &apos;fragmented&apos; data processing is a problem? </strong>Especially when you are with a team. Engineer A grabs bag A, and makes decisions based on it... Engineer B grabs bag B and makes a different decision based on it... The entire decision cycle happens in <u>silos</u>.</li><li><strong>The Physical SSD extraction is a problem too.</strong> In May 2025, I was at the Stuttgart ADAS &amp; AV Expo, and I met a company who invented a &quot;swap&quot; disk system... All of this is great, but that&apos;s still the same problem of storing, copy/pasting data, etc... to a system.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/IMG_3688-ezgif.com-optimize.gif" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">How to &quot;swap&quot; SSD hard drives in self-driving cars (B-PLUS Demo)</span></figcaption></figure><p><strong>In each of these problems, I noticed a <u>time</u> and <u>money</u> waste</strong>.</p><p>For example, the client reporting: &quot;<em>Engineers waited several days to weeks to access specific events due to the <strong><u>time-intensive</u></strong> process of uploading, filtering, and classifying raw data in the cloud.</em>&quot; is clearly facing a <strong><u>time</u></strong> problem, reviewing large amount of data... The other client who mentioned: &quot;<em>The full data pipeline we built &#x2014;from data capture to processing and storage&#x2014;incurred <strong><u>high cloud costs</u></strong> and consumed engineering resources</em>&quot; faces a <strong><u>money</u></strong> problem...</p><p>In this same report shared by Heex, all the companies reported improvement in their pipeline. Whether it was better decision making, more time freed, or money saved. This is why the next part is so important, so let&apos;s now focus on it: Event Driven Data Processing for autonomous cars.</p><h2 id="3-event-driven-data-management-for-autonomous-cars">3. Event Driven Data Management for autonomous cars</h2><p><strong>Back when I started learning autonomous driving algorithms</strong>, I listened to an interview from Sebastian Thrun, acknowledged as te godfather of self-driving cars, who at some point, said something that marked me: [paraphrased]: &quot;<em>With a team of 2/3, you can build a self-driving car that drives 90% of scenarios in a weekend. Then to get to 95%, it takes a few weeks, and to complete these last 5%, it takes years.</em>&quot;</p><p>This idea is called the &quot;long-tail&quot; problem.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="736" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-15.31.15--1-.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">If you drive 10 minutes, you&apos;ll see 90% of the events. In order to find edge cases, you must drive and record hours and hours of data</span></figcaption></figure><p><strong>When looking at traffic accidents involving autonomous vehicles</strong>, you often see rare events or edge cases at the root cause. The person wearing a stop sign t-shirt, the truck with a donkey on the trailer, the traffic sign burned by parisian riots, all of these unusual scenes totally different from empty highways cars are used to.</p><p>Some companies solve it with data generation, others with simulation, or with End-To-End Learning. Yet, the root of all evils here is data, and thus, this is what we have to change.</p><p><strong>A decade ago, the term &quot;data&quot; became king, and everybody became a Data Scientist</strong>, Data Engineer, Data Ops, Data Something. It was the case until recently when the data revolution passed, and breakthrough innovations happened not thanks to more data, but thanks to smarter training systems (like self-supervised learning), or more powerful architectures (like transformers). &quot;More data&quot; was ultimately not the solution, and thus, we have to switch our thinking...</p><h3 id="the-edge-intelligence-era-from-data-management-to-event-management">The Edge Intelligence Era: From Data Management to Event Management</h3><p><strong>After companies have recorded a few laps of the neighborhood they drive in</strong>, recording more of this same scene doesn&apos;t make sense. Companies record more and more, just to spot the 1% of long-tail events. What if we worked on these events only, from the beginning?</p><p>It can be done, by setting up a &quot;triggers&quot; in your system, that will act as a filter and only capture the scene when interesting events happen, such as:</p><ul><li><strong>Objects Missed</strong>: If one camera misses an object that another sensor sees</li><li><strong>Near Pedestrian Collision</strong>: If pedestrians are within 2 meters of our car, and we drive over 30km/h</li><li><strong>Human Intervention</strong>: If a human driver manually took over</li><li><strong>Shakes</strong>: If the camera physically moved due to a bumper or small shock</li><li><strong>Ego Collision</strong>: If a collision with the ego vehicle happened</li><li>and so on...</li></ul><p>All of these are valid events we&apos;d like to record. The rest? When it&apos;s all smooth? Well, we already have millions of it.</p><p>An example with Heex Technologies, and their platform allowing to set triggers:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://mintlify.s3.us-west-1.amazonaws.com/heextechnologies/public/img/welcome-to-heex-smart-data-platform/triggers.png" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="3204" height="1808"><figcaption><span style="white-space: pre-wrap;">The Heex Technology platform, allowing you to set &quot;triggers&quot;, such as collision, hard brake, and so on...</span></figcaption></figure><p>If I were to show you the 2.0 process, it&apos;d look like this:</p><ul><li>You have the <strong>same</strong> <strong>car</strong> with LiDARs generating the same 10Gb/h data</li><li>Rather than recording all data available, you <strong>define</strong> <strong>triggers</strong>.</li><li>You <strong>intelligently</strong> record the events, like the near pedestrian collision, and not all the data</li><li>You get <strong>instant notifications</strong>, <strong>labels</strong>, and can do real-time decision making</li></ul><p>Seems smarter, isn&apos;t it?</p><p>Now that you have this in mind, I&apos;d like to show you the last era...</p><h3 id="the-autonomous-era-ai-does-it-for-you">The Autonomous Era: AI does it for you</h3><p><strong>The next step is to create algorithms to do it automatically for us.</strong> For example, Tesla patented<a href="https://xilhylujaogys6v6dwfuqa5wrtivfkprrhmf6w7eh7zo46hdgvmq.arweave.net/uhZ8LokDjYl6vh2LSAO2jNFSqfGJ2F9b5D_y7njjNVk" rel="noopener noreferrer"><strong> a concept called trigger classifiers</strong></a><strong>. </strong>The idea is to train their<strong> </strong><a href="https://www.thinkautonomous.ai/blog/how-tesla-autopilot-works/" rel="noopener noreferrer"><strong>HydraNet</strong></a> backbone to classify whether the general scene it&apos;s learning from contains unusual events or not. If it does, let&apos;s say above a certain confidence score, then the machine learning models will trigger a warning.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1698" height="1232" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Screenshot-2025-06-27-at-12.19.34.jpg 1698w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Tesla&apos;s Trigger Classifiers: The Backbone (which focuses on the general scene) is outputting classification on the nature of the scene it&apos;s looking at</span></figcaption></figure><p><strong>Whether you&apos;re working on spreadsheets or building autonomous vehicle technology, automating manual items like labelling or searching for data makes sense. </strong>In this case, just like the Edge Intelligence Era, you can see the events being captured live while driving, and not after.</p><h4 id="the-20-vision">The 2.0 Vision</h4><p><strong>This is going with the &quot;2.0&quot; vision of self-driving cars that companies now define</strong>. A vision driven by Deep Learning first, where data matters, but where more data isn&apos;t the solution. In the 2.0 vision, quality is better then quantity; contextual intelligence is needed, learning should be real-time, and training should be done on relevant data.</p><p>If the 1.0 vision involved heavy test vehicles, modular architectures; the 2.0 vision is about AI &amp; efficiency.</p><p>Now, let&apos;s see an example of a company specialized in this...</p><h2 id="example-how-heex-technologies-turns-data-into-event-management">Example: How Heex Technologies turns Data into Event Management</h2><figure class="kg-card kg-image-card"><img src="https://heex.cdn.prismic.io/heex/65cd1d149be9a5b998b5d409_heex-light.svg?rect=0%2C0%2C100%2C36&amp;w=256&amp;fit=max" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="100" height="36"></figure><p><strong>One of the companies that captured this vision the best is </strong><a href="https://www.heex.io/en-gb/smarter-data-faster-decisions" rel="noreferrer"><strong>Heex Technologies</strong></a>. They built a SaaS platform that implements exactly these ideas of &quot;triggers&quot; &#x2014;&#xA0;and their motto is that rater than focusing on the data, they focus on events. As I already showed you the triggers, we could see it in action:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/analytics--1-.jpg" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1080" height="834" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/analytics--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/analytics--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/analytics--1-.jpg 1080w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Heex&apos;s Visualization Platform shows you the critical events happening, where they happened, and give you full power to solve the long tail problem</span></figcaption></figure><p>Let&apos;s look at their pipeline, which you&apos;ll notice also works backwards &#x2014; once the bag is generated:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp" class="kg-image" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)" loading="lazy" width="1312" height="739" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/06/Z9LeRRsAHJWomftg_Screenshot2025-03-13at13.webp 1312w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">In this example, we take an existing &quot;dumb&quot; ROS Bag and turn it into a &quot;smart&quot; bag</span></figcaption></figure><p><strong>From a heavy bag, we get a smart bag</strong>. The data is definitely smarter when automatically annotated, categorized, and when relevant events are flagged. We can then re-inject this data into the training pipeline, without having to worry about the rest of the dataset.</p><p><strong>As an entrepreneur myself, I can only admire the focus on one specific and painful problem like this one</strong>. When you can anticipate customer needs, and enable automakers and automotive engineers move away from a complex process to focus on their core job (<a href="https://courses.thinkautonomous.ai/self-driving-cars" rel="noopener noreferrer">self driving technology</a>)... you win!</p><p>Alright, let&apos;s do a summary and see what to do next:</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Collecting data is essential to train AI models and develop self-driving vehicles</strong>. Yet, &quot;more data&quot; is not the solution to create breakthrough, and solve the &quot;long tail&quot; problem, which cause a significant challenge.</li><li><strong>The first Era of data processing is the Manual Era,</strong> in which we record and process everything manually. (1.0)</li><li><strong>The second era (1.5) is the cloud version</strong>, in which you work with data lakes and build a real &quot;chain&quot; that contains DataOps, MLOps, ValidationOps, and so on...</li><li><strong>The third era moves to the 2.0.</strong> It&apos;s where we stop obsessing on the data, and focus on events. We can use triggers and platforms like Heex Technologies to do it.</li><li><strong>The fourth era is the AI Era</strong>. (2+) This is where we have AI automatically find events, and train itself continuously on these.</li></ul><p>Which solution is right for you? In reality, they can all work. A small startup can work manually, until they find their hard problem to solve, and they have a budget to invest in data lakes... Companies can work with data lakes, but for bigger fleets, it&apos;d make much more sense to think in terms of events instead.</p><h3 id="next-steps">Next Steps</h3><p><strong>&#xA0;If you realise you have these problems of recording everything</strong>, having your data staying a bit &#xAB;&#xA0;dumb&#xA0;&#xBB;, and would like to know exactly how<u> to stop recording everything</u> by this afternoon (without losing the important information)... </p><p><strong>... Then I&#x2019;d recommend to check out Heex free discovery quiz</strong>, which will reveal tell you exactly what you&#x2019;re doing wrong today, and (based on your answers) show you what to do this afternoon to save hours of recording, data processing, etc...</p><p>It&#x2019;s free, and you can get access below:</p><div class="kg-card kg-product-card">
            <div class="kg-product-card-container">
                <img src="https://www.thinkautonomous.ai/blog/content/images/2025/07/Screenshot-2025-07-29-at-09.59.36.jpg" width="2720" height="632" class="kg-product-card-image" loading="lazy" alt="How to stop recording 100% of what self-driving cars sees (Introduction to Event Driven Automotive Data Processing)">
                <div class="kg-product-card-title-container">
                    <h4 class="kg-product-card-title"><span style="white-space: pre-wrap;">Heex Free Discovery Quiz</span></h4>
                </div>
                

                <div class="kg-product-card-description"><p><span style="white-space: pre-wrap;">Is your data strategy a silent obstacle?</span></p></div>
                
                    <a href="https://forms.gle/LimFvrHtb5zjqJUv7" class="kg-product-card-button kg-product-card-btn-accent" target="_blank" rel="noopener noreferrer"><span>Take the Quiz</span></a>
                
            </div>
        </div><p>You can also take a look at Heex&apos; product here: <a href="https://www.heex.io/en-gb/smarter-data-faster-decisions">https://www.heex.io/en-gb/smarter-data-faster-decisions</a></p>]]></content:encoded></item><item><title><![CDATA[Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones]]></title><description><![CDATA[Discover an exclusive excerpt from my Interview with Shield AI, a US Based company in the autonomous defense industry. You'll learn about infiltrationb drones, visual slam, ViDARs, and V-BAT VTOL systems.]]></description><link>https://www.thinkautonomous.ai/blog/shield-ai/</link><guid isPermaLink="false">690b2bf9bad329532556f25e</guid><category><![CDATA[field interviews]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 23 Jul 2025 22:00:00 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/11/shield-ai.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/shield-ai.jpg" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"><p><strong>How much impact do you believe your job has?</strong> Is your job saving someone time? Or money? Or... his life? Well, what if you worked on projects that saved people&apos;s lives? Such companies exist, in self-driving cars, in healthcare, and in the case of this article... in <strong>Autonomous Defense!</strong></p><p>This summer, I have interviewed Vibhav Ganesh from&#xA0;<a href="https://www.shield.ai/" rel="noreferrer"><strong>Shield</strong> <strong>AI</strong></a>, a U.S.-based defense technology company that develops autonomous systems for military and government use. </p><p>I would write a big paragraph here, but let me instead show you a quick sample from the interview...</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x2712;&#xFE0F;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Vibhav Ganesh is the&#xA0;Director of Engineering</strong></b>, past Chief of Staff to the CTO, and Employee #20 of Shield AI.<br><br><b><strong style="white-space: pre-wrap;">Vibhav has played a pivotal role in the company&apos;s growth and innovation. </strong></b>With a background in visual inertial odometry and SLAM, he has been at the forefront of developing autonomous systems like the Nova 2 quadcopter.</div></div><p>Let&apos;s read his intro to Shield AI and to their core products: the V-BAT and the ViDAR.</p>
<!--kg-card-begin: html-->
<iframe src="https://www.linkedin.com/embed/feed/update/urn:li:ugcPost:7353401032464293888?collapsed=1" height="550" width="504" frameborder="0" allowfullscreen title="Embedded post"></iframe>
<!--kg-card-end: html-->
<p>There are a lot of things to note about their products: <strong>the V-BAT can last 10 hours,</strong> which is a technological achievement itself, thanks to V-TOL (vertical takeoff and landing)... <strong>ViDAR</strong> is also a very interesting product, which stands for Visual Detection And Ranging... and HiveMind (not shown here) is their AI, or as they call it, &quot;The World&apos;s Best AI Pilot&quot;.</p><p>Let me take you to the v-BAT first, as it&apos;s the core product, by showing you this LinkedIn post we did together, where Vibhav Ganesh introduces us to Shield AI.</p><p><strong>Together, we recorded an exclusive Fragment of&#xA0;</strong><a href="https://www.thinkautonomous.ai/the-edgeneers-land" rel="noreferrer"><strong>The Edgeneer&apos;s Land</strong></a><strong>, </strong>my community membership experience, in which he takes us through Shield AI.&#xA0;What is autonomous defense? What are the main technologies involved? What is the range of products?</p><p><strong>In this post, I&apos;d like to give you a small sample of that interview</strong>, highlighting a very interesting moment where Vibhav talked about infiltration drones.</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Before we begin, do you like field interviews?</strong></b> I am bringing new guests to my membership every single month, and when you join my daily emails, you can not only be aware of when these interviews get released, you can also get the opportunity to access the complete training we build for them inside our membership.<br><br>If you&apos;d like to get started, <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">you can receive the emails here</a>.</div></div><hr><h2 id="inside-shield-ais-tactical-infiltration-drones">Inside Shield AI&apos;s Tactical Infiltration Drones</h2>
<!--kg-card-begin: html-->
<iframe src="https://player.vimeo.com/video/1133789552?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="1920" height="1080" frameborder="0" allow="autoplay; fullscreen; picture-in-picture; clipboard-write; encrypted-media; web-share" referrerpolicy="strict-origin-when-cross-origin" title="Shield AI Tactical Infiltration Drones"></iframe>
<!--kg-card-end: html-->
<div class="kg-card kg-toggle-card" data-kg-toggle-state="close">
            <div class="kg-toggle-heading">
                <h4 class="kg-toggle-heading-text"><span style="white-space: pre-wrap;">Read the transcript</span></h4>
                <button class="kg-toggle-card-icon" aria-label="Expand toggle to read content">
                    <svg id="Regular" xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24">
                        <path class="cls-1" d="M23.25,7.311,12.53,18.03a.749.749,0,0,1-1.06,0L.75,7.311"/>
                    </svg>
                </button>
            </div>
            <div class="kg-toggle-content"><p><b><strong style="white-space: pre-wrap;">JEREMY</strong></b><span style="white-space: pre-wrap;">: Okay Vibhav. I&apos;d like to start with the quadcopter. What you call Nova 2. Can you give us an overview of how it works?</span><br><br><b><strong style="white-space: pre-wrap;">VIBHAV</strong></b><span style="white-space: pre-wrap;">:&#xA0;Yeah, I&apos;d love to. So just to kind of understand where we&apos;re coming from, I&apos;ll give a little backstory of Shield, and talk a little bit about how I evolved in it, and then how Shield has evolved over that that time as well.</span><br><br><b><strong style="white-space: pre-wrap;">So the entire existence of Shield,&#xA0;our mission has been to protect serve, members and civilians using intelligent systems. </strong></b><span style="white-space: pre-wrap;">And we do that by providing&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">platforms</span></u><span style="white-space: pre-wrap;">&#xA0;that are capable of operating&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">at the edge</span></u><span style="white-space: pre-wrap;">, providing&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">software</span></u><span style="white-space: pre-wrap;">&#xA0;that allows different kinds of platforms to be resilient to comms and GPS denial and operate in a really, really sticky and dangerous environments.</span><br><br><b><strong style="white-space: pre-wrap;">And we believe the greatest victory requires no war, </strong></b><span style="white-space: pre-wrap;">and we achieve this by equipping the US and its allies with the ability to see and act anywhere at any time.</span><br><br><span style="white-space: pre-wrap;">That started back in 2016/2015 in very niche ConOps, specifically indoor ConOps. There, we&apos;re focused on kind of building&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">clearance.</span></u><br><br><b><strong style="white-space: pre-wrap;">What our founder, Brandon came back from his deployments and saw as kind of lack of technology really servicing the members that were protecting us</strong></b><span style="white-space: pre-wrap;">, and&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">particularly in areas where they were going in kind of blind to buildings</span></u><span style="white-space: pre-wrap;">, if you can imagine you had, you know, in the Middle East conflicts, there were just these buildings that were there. You had no idea what&apos;s happening inside them.</span><br><br><b><strong style="white-space: pre-wrap;">And in order to kind of under make sure the city was safe</strong></b><span style="white-space: pre-wrap;">, you have to go inside and verify that there was no explosives or militants in there. And what they used to do was send people through this because there was no robots or technology capable to do that.</span><br><br><span style="white-space: pre-wrap;">And if you can imagine yourself doing that, it&apos;s extremely scary going in blind, not knowing what&apos;s going to happen, what&apos;s going to be on the other side of that door.</span><br><br><span style="white-space: pre-wrap;">And so what he wanted to create is a system that could do that for the operator, instead of having the person go do that.</span></p><p><b><strong style="white-space: pre-wrap;">And so the quadcopter Nova 1 was born out of that idea of, how do you provide information before you send a person through?</strong></b><br><br><span style="white-space: pre-wrap;">And the goal there wasn&apos;t necessarily to build a quadcopter, but it was just the first apple. Just the first application of autonomy in the defense space that was very, very tangible and very easy for us to apply ourselves to. And so we just designed and built a state of the art indoor autonomous surveillance device.</span><br><br><b><strong style="white-space: pre-wrap;">Nova one was ahead of its league in many different areas.</strong></b><span style="white-space: pre-wrap;"> One of the things that really stuck out to me from coming from academia, before I was doing a master&apos;s in robotics at CMU, and we saw a lot of really cool applications of autonomy there too, but the hardware systems were not that capable.&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">Like the flight time was 3 to 5 minutes.</span></u><span style="white-space: pre-wrap;">&#xA0;The processing was very slow.&#xA0;You could operate very slowly.&#xA0;</span><u><span class="underline" style="white-space: pre-wrap;">Most of these videos you were seeing back then were sped up by 8x or 7x just to make sure they look compelling</span></u><span style="white-space: pre-wrap;">.</span><br><br><span style="white-space: pre-wrap;">But what Shield had accomplished was real-time exploration at staggering speeds.&#xA0;At one point, we did a comparison of how fast can a quadcopter clear an environment compared to six Navy SEALs, and the quadcopter actually finished in a third of the time compared to those.</span><br><br><b><strong style="white-space: pre-wrap;">JEREMY</strong></b><span style="white-space: pre-wrap;">: Wow! Okay, I see!</span><br><br><b><strong style="white-space: pre-wrap;">VIBHAV</strong></b><span style="white-space: pre-wrap;">: Isn&apos;t that crazy, just how fast this thing was operating. And back then we were, you know, at a limited, limited sensor suite. So we had a&#xA0;2D scan LiDAR, we had a&#xA0;camera, we had some&#xA0;sonars&#xA0;and an&#xA0;Intel Neural Compute Stick. So it was very limited hardware back then, because it&apos;s 2017 but was able to actually accomplish this mission.</span><br><br><b><strong style="white-space: pre-wrap;">So as long as there was a window or door for to fly in</strong></b><span style="white-space: pre-wrap;">, a human operator which would enter, it would enter the vicinity and kind of say, this is the building I want to enter. And from then on, it would be fully autonomous, no comps required. It would find an entrance.</span></p></div>
        </div><p><strong>Impressive, isn&apos;t it? </strong>What I really love about it is that it&apos;s <u>down to earth.</u><strong> </strong>I could see myself assembling a drone kit,&#xA0;adding a camera, a 2D LiDAR, and starting experimenting with Visual SLAM projects to map a room. This is basically what Shield AI did, when they got started. Except that their drone was (1) targetted to a specific client and (2) better than all competition.</p><p>There are 2/3 insights I&apos;d like to share with you, from Vibhav:</p><h3 id="1-self-driving-car-autonomous-transfer-doesnt-work-as-wed-think">1) Self-Driving Car &gt; Autonomous Transfer doesn&apos;t work as we&apos;d think</h3><p>Now, here is something important to note:</p><blockquote class="kg-blockquote-alt"><strong>A lot of what you learn in autonomous robots CANNOT simply be transferred to drones.</strong></blockquote><p><strong>I did think that it was a matter of copy and paste</strong>. But I understood I got it wrong when making this fragment, especially with Vibhav Ganesh who told me that their&#xA0;drones don&apos;t have LiDARs, fly over seas, or over deserts, no man&apos;s land zones, between mountains, campaigns, with 3D constraints, and no map!</p><blockquote>I thought about it for a minute, and I realized...&#xA0;<strong>&quot;Wait, it&apos;s absolutely NOT like autonomous cars!&quot;</strong></blockquote><p><strong>And in fact, when you start looking into autonomous drone architecture</strong>, they absolutely don&apos;t look like self-driving car architectures! For example with Shield AI, they have a Control, a Station, RTOS, a ViDAR, but also a Flight Controller powered with frameworks like PX4 and Maven. This is an entire set of libraries to learn.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/unnamed.jpg" class="kg-image" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones" loading="lazy" width="720" height="405" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/11/unnamed.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/11/unnamed.jpg 720w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The external architecture of Shield AI&apos;s components</span></figcaption></figure><p>From there, here is a second insight:</p><h3 id="2-visual-slam-is-mostly-used">2) Visual SLAM is mostly used</h3><p>Coming back to the idea that we are NOT like self-driving cars... the other main difference is that we use no map. So without a map, and with just a camera, they have no choice but to use...<strong>Visual SLAM!</strong> </p><p>And Vibhav explains really well what kind of SLAM they&apos;re using, how they implement the mapping, even though there is 0 starting point, and so on. Here is a sample of a vSLAM project I&apos;ve tested with drones:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/68747470733a2f2f692e696d6775722e636f6d2f554b4c7444374c2e676966-ezgif.com-optimize.gif" class="kg-image" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones" loading="lazy" width="600" height="337" srcset="https://www.thinkautonomous.ai/blog/content/images/2025/11/68747470733a2f2f692e696d6775722e636f6d2f554b4c7444374c2e676966-ezgif.com-optimize.gif 600w"></figure><p>This is the nitty gritty of Shield AI&apos;s work. Once you build a SLAM MAP, you can then feed that map to the Motion Planner, which sends a flight order to the drone. If you&apos;d like more insights on this technology, I highly recommend my <a href="https://www.thinkautonomous.ai/blog/visual-slam/" rel="noreferrer">vSLAM</a> article.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.thinkautonomous.ai/blog/visual-slam"><div class="kg-bookmark-content"><div class="kg-bookmark-title">The 6 Components of a Visual SLAM Algorithm</div><div class="kg-bookmark-description">How does Visual SLAM work? How is it different from normal SLAM? What are the 6 main steps of a Visual SLAM system? Let&#x2019;s find out!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.thinkautonomous.ai/blog/content/images/size/w256h256/2023/01/favicon.png" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"><span class="kg-bookmark-author">Read from the most advanced autonomous tech blog</span><span class="kg-bookmark-publisher">Jeremy Cohen</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.thinkautonomous.ai/blog/content/images/2024/03/visual-slam.jpg" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div></a></figure><p>Okay, would you like to see some samples?</p><h2 id="shield-ai-in-action">Shield AI in Action</h2><p>Let&apos;s take a look at 3 samples here:</p><figure class="kg-card kg-gallery-card kg-width-wide kg-card-hascaption"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/ScreenRecording2025-07-24at00.09.54-ezgif.com-optimize.gif" width="496" height="294" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/ScreenRecording2025-07-22at12.46.33-ezgif.com-optimize-1.gif" width="400" height="225" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div><div class="kg-gallery-image"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/11/f10teaser-ezgif.com-optimize.gif" width="480" height="270" loading="lazy" alt="Shield AI: ViDAR, V-BAT, and Tactical Infiltration Drones"></div></div></div><figcaption><p><span style="white-space: pre-wrap;">courtesy of </span><a href="https://www.shield.ai/" rel="noreferrer"><b><strong style="white-space: pre-wrap;">Shield</strong></b> <b><strong style="white-space: pre-wrap;">AI</strong></b></a></p></figcaption></figure><ul><li><strong>On the left, you can see drones being launched</strong>. These drones cannot do the &quot;vertical takeoff and landing&quot;. They are projected to the air by launchers and then fly like a plane. </li><li><strong>In the middle, you can see the tactical quadcopters we discussed</strong>. Notice how they use vSLAM at the end of the shot.</li><li><strong>On the right, you can see a mission of the v-BAT </strong>searching for a vessel in a sea canal.</li></ul><p>This is really cutting-edge, and totally applied what we are building in the autonomous tech space. Alright, time to wrap up!</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>The defense industry is extremely active</strong>. Hundreds of companies work on the autonomous generations of drones, anti-missile detectors, infiltration equipment, RADARs, and more...</li><li><strong>Shield AI is an active actor of the defence space,</strong> with a range of multiple products, such as the v-BAT, the ViDAR, and HiveMind.</li><li><strong>Shield AI started with NOVA 1,</strong> a quadcopter that could infiltrate into buildings, builds maps, surveys, without the need to send humans into the region. This allowed to prevent human losses due to buildings collapsing or being trapped.</li><li><strong>Besides from being safer, infiltration drones are also more efficient</strong>. Shield AI tested their drone against 6 NAVY SEALs clearing a building, and it finished in a third of the time.</li><li><strong>The transfer of autonomous car/robot technology to autonomous drone isn&apos;t as simple as we&apos;d think</strong>. Architectures are different, products are different, regions/environments are differents, and even the inside technologies and algorithms change.</li><li> On the other hand, some technologies really apply well to autonomous drones,  such as <a href="https://www.thinkautonomous.ai/blog/visual-slam/" rel="noreferrer">Visual SLAM.</a></li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E8;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interested in these interviews?</strong></b> I am bringing new guests to my membership every single month, and when you join my daily emails, you can not only be aware of when these interviews get released, you can also get the opportunity to access the complete training we build for them inside our membership.<br><br>If you&apos;d like to get started, <a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">you can receive the emails here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)]]></title><description><![CDATA[Medical Image Segmentation is one of the most important applications of Deep Learning in healthcare. Yet, most people only know 2D check x-ray segmentation. What about the 3D Scans? What about Foundation Models?

In this article, we're going to dive into it!]]></description><link>https://www.thinkautonomous.ai/blog/medical-image-segmentation/</link><guid isPermaLink="false">67d05c36c8f3bf93bd1872ad</guid><category><![CDATA[deep learning]]></category><category><![CDATA[computer vision]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Wed, 12 Mar 2025 11:58:06 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/03/medical-image-segmentation-1.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/medical-image-segmentation-1.webp" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)"><p><strong>On September 23, 1999, NASA&#x2019;s Mars Climate Orbiter&#x2014;a $125 million spacecraft</strong>&#x2014;was set to enter Mars&apos; orbit to study its climate and atmosphere. But just as it approached the planet, something went terribly wrong. Instead of entering a stable orbit, the spacecraft plunged into Mars&#x2019; atmosphere and was destroyed.</p><p><strong>After analysis, NASA found a unit mismatch: their Jet Propulser used metric units (newtons), while the spacecraft they got from Lockheed Martin used imperial units (pound-force). </strong>This caused navigation errors, making the spacecraft descend far too low into the Martian atmosphere; and causing a 125m$ loss.</p><p><strong>Human errors happen every day in all sorts of domains</strong>. In 2016, an alarming report from Johns Hopkins estimated that medical errors (including misdiagnoses) cause over 250,000 deaths annually in the U.S., making them the third leading cause of death.<strong> </strong>Many are due to errors in analysis of medical images, such as MRIs, X-Rays, CT Stans, and more.</p><p>In this article, I would like to show you how Medical <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer">Image Segmentation</a> can be used to counter this problem, and I&apos;ll do it in 3 points:</p><ol><li>2D Medical Image Segmentation</li><li>3D Medical Image Segmentation</li><li>Examples/Demo</li></ol><p>Let&apos;s get started...</p><h2 id="intro-to-2d-medical-image-segmentation">Intro to 2D Medical Image Segmentation</h2><p><strong>In 2019, I hosted the biggest AI Healthcare hackathon ever held</strong>,<strong> happening simultaneously over 20 cities!</strong> The goal at the time was to mix companies, healthcare groups, and engineers to build healthcare solutions using Deep Learning. After the 48 hours of coding, the winning team would win <strong>10,000 USD</strong>, the second <strong>4,000 USD</strong>, and then team 3, 4, 5, and 6 would win <strong>2,500 USD each</strong>!</p><p><strong>Great computer vision projects happened, </strong>and in fact, Paris (my city) finished the competition #2 via <a href="https://www.spotimplant.com/en/" rel="noopener noreferrer"><strong>Spot Implant</strong></a><strong>, </strong>a Shazam for Tooth Implants project that then became a startup. At the time, everybody was working on 2D Images. We had projects like Skin Melanoma detection, X-Ray segmentation, Brain Segmentation, and more...</p><p>Let me show you a few <u>tasks</u> in Medical Image Segmentation, and then we&apos;ll look at <u>algorithms</u>.</p><h3 id="2d-medical-image-segmentation-tasks">2D Medical Image Segmentation Tasks</h3><h4 id="x-ray-the-most-common">X-Ray (the most common)</h4><p><strong>First, we have X-Rays. X-Rays are the 2D representation of a body. </strong>We often see bones and organs there, and it&apos;s the most common image you&apos;ll find in Deep Learning x Healthcare. Using medical image segmentation, we can assist doctors in finding <u>bone fractures,</u> <u>lung diseases</u>, and other abnormalities. It can also help in screening large volumes of X-rays for <u>tuberculosis</u>, which is particularly useful in low-income countries with limited access to radiologists.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1182" height="384" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/74554810-8960-4331-a72a-44b6265653dc--1-.jpg 1182w" sizes="(min-width: 720px) 720px"></figure><p>This really is the most known among Deep Learning Engineers. I would like to show you other applications of segmentation...</p><h4 id="dermoscopy-segmentation-skin-lesion-segmentation">Dermoscopy Segmentation (skin lesion segmentation)</h4><p><strong>Dermoscopy segmentation was the health hackathon&apos;s top pick</strong>. It&apos;s all about using medical image segmentation to spot and separate skin lesions in dermoscopic images. By applying deep learning on medical images, we can quickly and accurately detect skin conditions like melanoma. This helps dermatologists diagnose and treat patients faster and manage large amounts of data more efficiently.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1182" height="384" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/23add975-8106-4eba-88b0-d36dc40790ea.jpg 1182w" sizes="(min-width: 720px) 720px"></figure><p>Let&apos;s see one or two more...</p><h4 id="mammography-segmentation">Mammography Segmentation</h4><p><strong>Mammograms are specialized X-ray images designed to reveal the inner structure of breast tissue. </strong>These images typically come in a flat, 2D format, capturing the breast from multiple angles to ensure a comprehensive view. The details in mammograms can show everything from dense tissue patterns to potential abnormalities like lumps or calcifications.</p><p><strong>Look at the image below: see how the role of a doctor/radiologist is to find these highlighted areas</strong>. The role of image segmentation is to assist the doctor, so he&apos;s not alone doing that high stake task of spotting problems (of course, it goes without saying that doctors also do much more than spotting, from understanding how bad a calcification can be, to finding the treatment, and so on...).</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="909" height="427" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/6726c6b3-6a6b-44b4-9566-de42cdf1c1f6.jpg 909w" sizes="(min-width: 720px) 720px"></figure><h4 id="other-types-ultrasound-%F0%9F%91%B6%F0%9F%8F%BD-endoscopy-%F0%9F%A4%A2-and-more">Other Types: Ultrasound &#x1F476;&#x1F3FD;, Endoscopy &#x1F922;, and more...</h4><p>We just saw 3 types: X-Rays, Dermoscopy, and Mammography. There are other types, such as ultrasound images (baby for examples), which can be 2D or 3D; or endoscopy, and more... The image below shows many 2D segmentation applications:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="435" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee1-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>So how do you build the segmentation results? What do you use? Let&apos;s take a look...</p><h3 id="2d-medical-image-segmentation-models">2D Medical Image Segmentation Models</h3><p><strong>Ever heard of UNet? </strong>You know, that 2015 model subtitled &quot;Convolutional Networks for Biomedical Image Segmentation&quot;. Well, it may be from 2015, but it&apos;s a great way to start! In fact, there have been lots of improvements of <a href="https://arxiv.org/pdf/1505.04597" rel="noopener noreferrer"><strong>UNet</strong></a>, to <a href="https://arxiv.org/pdf/1807.10165v1" rel="noopener noreferrer">UNet++,</a> <a href="https://arxiv.org/pdf/2102.04306" rel="noopener noreferrer">Trans-UNet</a>, <a href="https://arxiv.org/pdf/2105.05537" rel="noopener noreferrer">Swin-UNet,</a> all keeping that &quot;U&quot; shape, but using different pattern recognition techniques like Swin Transformers, CNNs, etc...</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/UNEt-Family.001.jpeg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/UNEt-Family.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/UNEt-Family.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/UNEt-Family.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/UNEt-Family.001.jpeg 1920w" sizes="(min-width: 720px) 720px"></figure><p>This is one family of semantic image segmentation algorithms, and here is what the results look like:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/image--1---1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="966" height="770" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/image--1---1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/image--1---1-.jpg 966w" sizes="(min-width: 720px) 720px"></figure><p>To get true numbers, Dice-Similarity coefficient (DSC) and average Hausdorff Distance (HD) are used as evaluation metric to evaluate these algorithms.</p><p><strong>These are great, but what happens when you don&apos;t have millions of labeled data?</strong> In healthcare, getting access to labeled, free-to-use data isn&apos;t easy; especially for certain types of diseases that are specific to certain hospitals, and so on... In these cases, you can use more &quot;foundational&quot; semantic segmentation models such as <strong>SAM (Segment Anything) or SAM2</strong>. These have been trained using Self-Supervised Learning on &quot;the entire internet&quot;, and thus are supposed to find problems better.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1324" height="846" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1324w" sizes="(min-width: 720px) 720px"></figure><p><strong>For example, </strong><a href="https://github.com/bowang-lab/MedSAM" rel="noopener noreferrer"><strong>MEDSAM</strong></a><strong> is a Medical SAM (Segment Anything) is what I used for the images above</strong>. It&apos;s the regular SAM, but tweaked for medical image segmentation, to boost the segmentation performances. The model performance is quite high, and we get a top notch <a href="https://www.thinkautonomous.ai/blog/computer-vision-applications-in-self-driving-cars/" rel="noopener noreferrer">Computer Vision</a> project using image segmentation... It can can even take your prompt as a region of interest bounding box to return the segmented masks:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/298777510-a8d94b4d-0221-4d09-a43a-1251842487ee-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="259" height="244"></figure><p>So this is for the first part on 2D images... Now what are 3D images?</p><h2 id="3d-medical-image-segmentation-ct-scans-mris">3D Medical Image Segmentation: CT Scans &amp; MRIs</h2><p>Now come 3D images! For this part, I&apos;ll talk about the two use cases (CT Scans &amp; MRIs) and discuss the algorithms together.</p><h3 id="ct-scans-use-cases-algorithms">CT Scans: Use Cases &amp; Algorithms</h3><h4 id="use-cases-for-ct-scans-3d-representation">Use Cases for CT Scans &amp; 3D Representation</h4><p>In the <a href="https://flare22.grand-challenge.org/Dataset/" rel="noopener noreferrer"><strong>FLARE 2022 dataset</strong></a> (Fast and Low-resource semi-supervised Abdominal oRgan sEgmentation), we get access to a few hundred labeled and unlabeled cases with liver, kidney, spleen, or pancreas diseases as well as examples of uterine corpus endometrial, urothelial bladder, stomach, sarcomas, or ovarian diseases.</p><p>Hey, relax. I&apos;m just scaring you. I didn&apos;t get a clue of what that meant either. Except that:</p><p><strong>These are <u>CT SCANS </u>(Computed Tomography Scans)</strong>. A CT scan uses X-rays to create detailed, cross-sectional (slice-by-slice) images of the inside of the body. They&apos;re more detailed than traditional X-rays because they produce 3D images by taking multiple X-ray images from different angles and combining them using a computer.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/20220309-FLARE22-Pictures-2.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1254" height="780" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/20220309-FLARE22-Pictures-2.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/20220309-FLARE22-Pictures-2.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/20220309-FLARE22-Pictures-2.jpg 1254w" sizes="(min-width: 720px) 720px"></figure><p><strong>So what&apos;s the &quot;3D&quot; output like? </strong><a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer"><strong>Voxels</strong></a><strong>? </strong><a href="https://www.thinkautonomous.ai/blog/point-clouds/" rel="noopener noreferrer"><strong>Point Clouds</strong></a><strong>? </strong>Not exactly. As I said, these are images taken with multiple &quot;layers&quot; (or dimensions). So your input image dimension isn&apos;t (512, 512, 3) but (512, 512, 129) or something like this. You have a multi-dimensional image on which you can apply image segmentation to each of the 2D slices:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="836" height="418" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/5a9fac4a-e2c4-43dd-82b4-87b760384634.jpg 836w" sizes="(min-width: 720px) 720px"></figure><p><strong>In this example, I used MedSAM to process individual 2D images.</strong> If you do it on the entire 3D CT Scan, you get something like this:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="396" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.48.13-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>If you get it, you understand that from these images, we can put that into a software that is going to reconstruct the scan to 3D:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1638" height="1088" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.22.02--1-.jpg 1638w" sizes="(min-width: 720px) 720px"></figure><p>From there, people go absolutely nuts and even try to make it into a point cloud (I&apos;m not sure why, but this is cool, shoutout to <a href="https://www.youtube.com/watch?v=3apDWJWe_jg" rel="noopener noreferrer">Beau Seymour&apos;s video</a>).</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at19.17.27-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="560" height="401"></figure><h3 id="mri-scans-advanced-medical-image-computing">MRI Scans: Advanced Medical Image Computing</h3><p><strong>Magnetic Resonance Imaging (MRI) Scans are another powerful tool in medical imaging.</strong> Unlike CT scans, MRIs use powerful magnets and radio waves to create detailed images of organs and tissues within the body. This technique is particularly great for soft tissue contrast, making it ideal for brain, spinal cord, and joint imaging. By leveraging medical image segmentation, MRI scans can aid in the precise identification of tumors, neurological disorders, and musculoskeletal issues.</p><p><strong>Here is an example of MRI scan and its segmentation task:</strong></p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="591" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ScreenRecording2025-03-11at16.42.07-ezgif.com-optimize.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>So now, let&apos;s see how to process that...</p><h3 id="algorithms-in-the-3d-medical-image-segmentation-domain">Algorithms in the 3D Medical Image Segmentation Domain</h3><p>We already discussed SAM (Segment Anything) and how it can work on individual slices. The reality is, medical image segmentation involves a lot of complex &quot;job&quot; knowledge; and it would probably be better to use a specialized artificial intelligence model for optimal model performance. Today, in AI, we have two types of models:</p><ul><li>Foundation Models, that are very general and know everything</li><li>Specific &amp; Labeled Models, that can only process the images it&apos;s been trained on</li></ul><p>I would like to show you two models doing both: TotalSegmentor &amp; Vista-3D.</p><h4 id="total-segmentator-a-specific-model-for-2d-and-3d-segmentation">Total Segmentator: A specific model for 2D and 3D Segmentation</h4><p>Perhaps one of the most used and well known &quot;framework&quot; for image segmentation of both 2D and 3D data is<strong> </strong><a href="https://arxiv.org/pdf/2208.05868" rel="noopener noreferrer"><strong>TotalSegmentator</strong></a>. Rather than being a simple machine learning model, it&apos;s a complete framework that does the automatic labelling.</p><p>The number of classes for CT and MRI data it can segment is gigantic:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/overview_classes_v2--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="2000" height="1127" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/overview_classes_v2--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/overview_classes_v2--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/overview_classes_v2--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/overview_classes_v2--1-.jpg 2388w" sizes="(min-width: 720px) 720px"></figure><p>And the model is based on the <a href="https://arxiv.org/pdf/1809.10486" rel="noopener noreferrer"><strong>nn-UNet architecture</strong></a><strong>,</strong> which is similar to UNet, but can also take in different medical imaging modalities.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/nnU-Net_overview--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1392" height="1065" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/nnU-Net_overview--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/nnU-Net_overview--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/nnU-Net_overview--1-.jpg 1392w" sizes="(min-width: 720px) 720px"></figure><h4 id="vista-3d-foundation-model-for-3d-medical-image-segmentation">VISTA-3D: Foundation Model for 3D Medical Image Segmentation</h4><p><strong>VISTA-3D</strong> is a 2024 &quot;Foundation model&quot; from Nvidia that works on the 3D patch directly. While being named &quot;foundation&quot; model, it&apos;s incredibly specific to the medical image segmentation tasks. Here, we are PURELY in <a href="https://www.thinkautonomous.ai/blog/voxel-vs-points/" rel="noopener noreferrer">3D Deep Learning</a>.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1234" height="622" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-11-at-19.33.31--1-.jpg 1234w" sizes="(min-width: 720px) 720px"></figure><p>So we&apos;ve seen a lot:</p><ul><li>2D Segmentation can be done with models like UNet, UNet++, etc... (specific), or SAM (foundation)</li><li>3D Segmentation can be done with models like nnUNet/TotalSegmentator (specific), or Vista-3D &amp; SAM (foundation)</li></ul><p>Let&apos;s see examples now...</p><h2 id="example-1-ct-scan-segmentation-with-vista-3d">Example 1: CT Scan Segmentation with Vista-3D</h2><p>In <a href="https://build.nvidia.com/nvidia/vista-3d" rel="noopener noreferrer"><strong>this platform</strong></a><strong> from Nvidia</strong>, I am able to select a CT Scan and call Vista-3D to process it.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ezgif.com-optimize--1-.gif" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="800" height="453" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ezgif.com-optimize--1-.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ezgif.com-optimize--1-.gif 800w" sizes="(min-width: 720px) 720px"></figure><p>Notice how we can select an Abdomen, and then pick all the organs we want to segment. Finally, we can get the view from 3 different &quot;angles&quot; and process that too!</p><h2 id="example-2-ct-scan-segmentation-with-totalsegmentator">Example 2: CT Scan Segmentation with TotalSegmentator</h2><p>On <a href="https://totalsegmentator.com/" rel="noopener noreferrer"><strong>totalsegmentator.com,</strong></a> we can upload images, and ask for complete segmentation. Here, I am going to upload a scan from the FLARE2022 dataset I mentioned above. The platform return hundreds of organs all in a weird format &apos;nii.gz&apos; format:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="2000" height="1854" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/Screenshot-2025-03-12-at-12.29.54--1-.jpg 2274w" sizes="(min-width: 720px) 720px"></figure><p>I can visualize some of these, and see what the output is like:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg" class="kg-image" alt="The Ultimate Guide to Medical Image Segmentation with Deep Learning (2D and 3D)" loading="lazy" width="1800" height="400" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/03/ab321f88-54eb-445b-96a0-9da8254a2ed1.jpeg 1800w" sizes="(min-width: 720px) 720px"></figure><p>Alright! So this is our second example, and both have playable demos! Let&apos;s now do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Medical image segmentation helps reduce human errors</strong> by processing 2D and 3D medical images like MRIs and X-rays.</li><li><strong>2D medical image segmentation tasks </strong>include X-Rays, dermoscopy (skin lesion analysis), endoscopy, mammography segmentation (breast), and more...</li><li><strong>UNet and its variants are popular models for 2D medical image analysis,</strong> utilizing CNNs or Transformer approaches. Foundation models like SAM (Segment Anything Model) can also be fine-tuned on medical images, like with MedSam.</li><li><strong>3D medical image segmentation involves CT(computed tomography) and MRI (magnetic resonance imaging) scans</strong>. They&apos;re called 3D images because they&apos;re multiple slices of the same image under different views.</li><li><strong>MedSAM can process 2D slices of 3D scans</strong>, allowing individual segmentation of each slice. We can then fit that into a software that will do a reconstruction into a complete 3D image.</li><li><strong>For 3D processing, TotalSegmentator and Vista-3D are solid solutions,</strong> being either specific or foundation based.</li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Next Step?</strong></b><br>Receive my Daily Emails, and get continuous training on Computer Vision &amp; Autonomous Tech. Each day, you&apos;ll receive one new email, sharing some information from the field, whether it&apos;s a technical content, a story from the inside, or tips to break into this world; we got you.<br><br><a href="https://www.thinkautonomous.ai/lplb-cuttingedgeengineer" rel="noreferrer">You can receive the emails here</a>.</div></div>]]></content:encoded></item><item><title><![CDATA[Video Segmentation: Why the shift from image to video processing is essential in Computer Vision]]></title><description><![CDATA[<p><strong>In 1897, French police faced a difficult problem:</strong> a serial killer named Joseph Vacher was stealing and murdering sheperds, and remained impossible to catch. Every time he was arrested, he gave a different name, changed his appearance, used fake mustaches, wigs, and different clothing styles... and got to disappear without</p>]]></description><link>https://www.thinkautonomous.ai/blog/video-segmentation/</link><guid isPermaLink="false">67b46fe9eaa12c28321be825</guid><category><![CDATA[computer vision]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 18 Feb 2025 15:51:15 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/02/video-segmentation.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/video-segmentation.jpeg" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision"><p><strong>In 1897, French police faced a difficult problem:</strong> a serial killer named Joseph Vacher was stealing and murdering sheperds, and remained impossible to catch. Every time he was arrested, he gave a different name, changed his appearance, used fake mustaches, wigs, and different clothing styles... and got to disappear without the police realizing they just controlled France&apos;s most wanted man.</p><p><strong>At the time, France had no national ID system</strong>, <strong>and no way to prove that the man they caught today was the same man they arrested months ago</strong>. That was until an officer named Alphonse<strong> </strong>Bertillon introduced a revolutionary method: <u>anthropometry</u>. It&apos;s a system that labeled criminals based on of 12 unchangeable physical measurements like ear shapes, skull sizes, and limb lengths, that could not be faked.</p><p><strong>One day, Vacher was caught for attacking a woman, and this time, the police used Bertillon&apos;s system to compare his measurements to what they had in their records</strong>: they discovered they just caught France&apos;s most wanted criminal. This time, he could not escape with a warning, and got sent to... yeah &#x2014; the guillotine &#x1F937;&#x1F3FB;&#x200D;&#x2642;&#xFE0F;&#x1F1EB;&#x1F1F7;</p><p>What got Vacher executed wasn&#x2019;t just this one-time capture, but <strong>the ability to analyze a series of events and not just a one-time event.</strong> And this is exactly what this article is about: the shift from frame-by-frame to sequence processing, here in Computer Vision with videos. And this is done via something called <strong>video segmentation.</strong></p><p>So let&apos;s get started:</p><h2 id="what-is-video-segmentation">What is Video Segmentation?</h2><p><strong>Most Computer Vision Engineers spend time learning about image processing,</strong> <strong>but never consider what happens when you use a video.</strong> Yet, tons of architectures today, whether in surveillance, retail, sports analysis, healthcare, or even robotics and self-driving cars &#x2014;&#xA0;now process videos instead of images. The sequence brings something individual images don&apos;t, just like the Vacher story, where he was able to get judged through all the murders he committed.</p><p><strong>So let&apos;s take a less deadly scene &#x2014;&#xA0;shoplifting detection in retail</strong>. There is a startup I once interviewed for named <a href="https://www.veesion.io" rel="noopener noreferrer"><strong>Veesion</strong></a> &#x2014;&#xA0;that has this amazing video on their homepage:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="322" height="322"><figcaption><span style="white-space: pre-wrap;">Shoplifting Demo by Veesion</span></figcaption></figure><p><strong>Can you see everything happening here?</strong></p><ul><li>We have the <a href="https://www.thinkautonomous.ai/blog/object-tracking/" rel="noopener noreferrer"><strong>object tracking</strong></a><strong> </strong>(the second man is moving from aisle 1 to aisle 2)</li><li>The <strong>event</strong> <strong>detection</strong> (at 00:03, a man puts an item in a pocket)</li><li>The <strong>action</strong> <strong>classification</strong> (of putting something in a pocket)</li><li>The <strong>video</strong> <strong>decomposition</strong> (shoplifting from 00:02 to 00:03 &#x2014;&#xA0;standing from 00:03 to 00:06)</li><li>The <strong>people</strong> <strong>counting</strong> (2 people in the video, one is obstructing the other)</li><li>And more...</li></ul><p>Among these, there is the idea of &quot;<u>segmenting</u>&quot; the scene to track the shoplifters through the video. You can see the hands being in red, consistently from frame to frame. So this is the idea of Video Processing, and Video Segmentation is a sub-branch of it focus on the task of segmenting a scene.</p><p><strong>There are two types of Video Segmentation tasks:</strong></p><ul><li>Video <strong><u>Object</u></strong> Segmentation (VOS)</li><li>Video <strong><u>Semantic</u></strong> Segmentation (VSS)</li></ul><h3 id="video-object-segmentation">Video Object Segmentation</h3><p><strong>In Video Object Segmentation, we are doing exactly what I did in this video</strong>. I define an object to track, send the video to the model, which tracks the object consistently across frames. It&apos;s purely &quot;object&quot; based, and is NOT used in a supervised way. For example, you can use semi-supervised video object segmentation, where you define an object on Frame 1, and let the model track it across the next frames... Or you can use totally unsupervised video object segmentation, where you won&apos;t even mention the objects to track.</p><p>Let me show you an example where I am shoplifting (muahahah):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879848204-ezgif.com-optiwebp.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Video Object Segmentation tracks objects consistently over time</span></figcaption></figure><p>See? We are able to track my head &amp; hands in blue, and the phone in yellow! That is the idea we&apos;re interested in... And even more when we can do this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="800" height="450" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/sam2_masked_video_1739879741562-ezgif.com-optiwebp.webp 800w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Floating head demo for no other purpose than getting you out of your exhausting boredom at work</span></figcaption></figure><p>Now, to be fair, the floating head experiment may NOT be the most useful thing in this example, but the stolen phone is. Now think of everything we can keep track of cells in health related videos, we can keep track of a player when analysing a football match, and a lot more...</p><h3 id="video-semantic-segmentation">Video Semantic Segmentation</h3><p><strong>In Video Semantic Segmentation, we&apos;ll really go at the pixel level, and rather than focusing on segmenting objects, we focus on the scene</strong>. The output is going to look extremely similar to a normal image segmentation task.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at13.47.29-ezgif.com-optimize.gif" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="560" height="315"><figcaption><span style="white-space: pre-wrap;">Video Semantic Segmentation</span></figcaption></figure><p>Just like image segmentation, you can also use video <a href="https://www.thinkautonomous.ai/blog/instance-segmentation/" rel="noopener noreferrer"><strong>instance segmentation</strong></a>, video panoptic segmentation, video semantic segmentation, and so on... And of course, there is the benefit of doing background extraction, to then process uniquely what&apos;s been segmented, for example in a case like <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer"><strong>lane detection</strong></a><strong> </strong>in self-driving cars:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="632" height="302" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/ScreenRecording2025-02-18at14.04.01-ezgif.com-optimize.gif 632w"><figcaption><span style="white-space: pre-wrap;">What could we do with this lane line information? Or these cars?</span></figcaption></figure><p>But from now, you may have a question:</p><h2 id="how-is-video-segmentation-different-than-image-segmentation">How is Video Segmentation different than Image Segmentation?</h2><p><strong>I mean, is it really different?</strong> It kinda looks similar to. image segmentation, right? And yes, while it may be the case for some examples, like the one I just gave with video semantic segmentation, most of the tasks will be different and give different outputs.</p><p><strong>To put it simply: Video Segmentation is about processing videos.</strong> You don&apos;t process image per image, you process video frames immediately. And this has several advantages:</p><ul><li><strong>The model can track multiple objects</strong> even though they&apos;re occluded (similar to what object tracking would do, but using video sequences)</li><li><strong>The model can segment specific scenes</strong> you&apos;re looking for (a blood cell changing sizes, a car entering a scene, a man stealing something)</li><li><strong>It ensures temporal consistency</strong>, meaning an object that appears in one frame keeps the same identity/color across the entire video, enabling tracking at the same time.</li><li><strong>It understands object motion</strong>, meaning it can predict where an object will be in the next frame instead of treating every frame as an isolated image (thanks mainly to video instance segmentation)</li><li><strong>For some models, it can be more efficient</strong>, since instead of running image segmentation on each frame separately, the model processes a video sequence, leveraging temporal information to process frames together, reducing redundant computations.</li></ul><p><strong>So, how does that work?</strong> What type of model does this? I do NOT have a specific &quot;do this do that&quot; template to share with you, but by studying examples, we could probably understand what&apos;s required to make a Video Segmentation algorithm work...</p><h2 id="example-1-vistr-video-instance-segmentation-transformer">Example 1: <strong>VisTR (Video Instance Segmentation Transformer)</strong></h2><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1114" height="504" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.26.17.jpg 1114w" sizes="(min-width: 720px) 720px"><figcaption><i><em class="italic" style="white-space: pre-wrap;">Video Instance Segmentation with Transformers (</em></i><a href="https://openaccess.thecvf.com//content/CVPR2021/papers/Wang_End-to-End_Video_Instance_Segmentation_With_Transformers_CVPR_2021_paper.pdf" target="_blank" rel="noopener noreferrer"><i><b><strong class="italic" style="white-space: pre-wrap;">source</strong></b></i></a><i><em class="italic" style="white-space: pre-wrap;">)</em></i></figcaption></figure><p>The first paper looks terribly simple. Let&apos;s try to understand the different blocks:</p><ul><li><strong>Input</strong>: First, we process raw video data, it&apos;s purely a sequence of images sent to the CNN</li><li><strong>Backbone</strong>: Then a normal 2D CNN processes each frame independently before concatenating the feature maps</li><li><strong>Video Processing:</strong> This is fed to a Transformer, known to process sequences quite well. However, we modify this transformer a bit to not just receive a positional encoding, but also a <em>temporal encoding</em>.</li><li><strong>Output:</strong> Finally, the output of the decoder predicts instances for each pixel, with a sequence matching strategy</li></ul><p>The training is done after obtaining labeled data from the <a href="https://youtube-vos.org/dataset/vis/" rel="noopener noreferrer"><strong>YoutubeVIS dataset</strong></a>, and the backbone is initialized with the weights of<strong> </strong><a href="https://arxiv.org/abs/2005.12872" rel="noopener noreferrer"><strong>DETR</strong></a>.</p><p>The detailed version looks like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1756" height="614" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.36.50.jpg 1756w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">VisTR detailed</span></figcaption></figure><p>As you can see, we have a video processing pipeline, where the transformer is actually aware of the frames. The segmentation process ends by matching pixels with instances. This is done using Bipartite Matching (<a href="https://www.thinkautonomous.ai/blog/hungarian-algorithm/" rel="noopener noreferrer"><strong>the Hungarian Algorithm</strong></a>). More subtle blocks exist, and I invite you to read the paper for more...</p><h2 id="example-2-sam-2-segment-anything-2">Example 2: SAM 2 (Segment Anything 2)</h2><p>If you didn&apos;t live in a cave around 2023, you probably heard of Segment Anything&#xA0;&#x2014; the segmentation model that could find <strong><em>any</em></strong> object in an image. Recently, it got an upgraded version called <a href="https://scontent.fcdg3-1.fna.fbcdn.net/v/t39.2365-6/464917098_581932941165933_4465312900778079623_n.pdf?_nc_cat=105&amp;ccb=1-7&amp;_nc_sid=3c67a6&amp;_nc_ohc=Mn0M6N9O9K4Q7kNvgHsDXZ8&amp;_nc_oc=AdiskhA1_LoHfyJs-eCrqi0Ff4_AhWlmF71ArIj0MOtfkVFvl0S3CBlghheMqNnFj7A&amp;_nc_zt=14&amp;_nc_ht=scontent.fcdg3-1.fna&amp;_nc_gid=AowO5fmUshA8NSDSOp9SkAs&amp;oh=00_AYDPLXLOi0edVnOB48aBIjiWzvYPIFrIwWkimA0rxel2Dg&amp;oe=67BA6932" rel="noopener noreferrer"><strong>SAM2</strong></a>, which is designed to process videos. Let&apos;s take a look:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1324" height="846" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.50.11.jpg 1324w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Image vs Video Segment Anything</span></figcaption></figure><p><strong>As you can see, SAM 2 differs from SAM by the addition of a <u>memory block</u>,</strong> made of a memory attention module, a memory encoder, and a memory bank that stores the past frames, and helps with temporal consistency.</p><p>If you played with <a href="https://sam2.metademolab.com/demo" rel="noopener noreferrer"><strong>the online demo</strong></a>, you will find that the model starts by asking you to click on an object, so it can keep tracking it. So at frame 0, you click the object you want to track, and then the model tracks it on the entire sequence...</p><p><strong>This is called &quot;Promptable&quot; Visual Segmentation.</strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="1616" height="614" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.44.04.jpg 1616w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">On frame 1, we click on the dog&apos;s tongue. On the next frame, the tong is tracked consistently. When the model fails, we manually click on it to restart the tracking</span></figcaption></figure><p>This is no different than the original SAM model, and in fact, it&apos;s using the same &quot;prompt encoder&quot;. So let&apos;s the see the details of the model:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="2000" height="701" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-18-at-15.53.42.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Detailed SAM2 graph</span></figcaption></figure><ul><li><strong>Prompt Encoder</strong>: As expected, we begin by clicking objects, which generates a &quot;prompt&quot;, and send this to the same encoder as Segment Anything to track object across each image</li><li><strong>Image Encoder</strong>: We then send the entire vide to the image encoder which is a masked autoencoder</li><li><strong>Memory Attention</strong>: Uses vanilla attention to condition the current frame features on the past frames features and predictions as well as on any new prompts</li><li><strong>Memory Bank: </strong>It retains information about past predictions for the target object in the video by maintaining a FIFO (first in first out) queue of memories of up to N recent frames.</li><li><strong>Mask Decoder (prediction)</strong>: Similar to SAM, but accounting for previous memory information</li></ul><p>So, you saw a second way to build a video segmentation algorithm. The first way was fully transformer based; and this second way has the somewhat robotic &quot;memory bank&quot;; and this because this model is a &quot;hybrid&quot; between 100% video processing, and frame-by-frame processing.</p><h2 id="image-vs-video-segmentation-worth-the-trouble">Image vs Video Segmentation: Worth the trouble?</h2><p>I would say yes, especially considering all the use cases that can benefit video segmentation. For example, <strong><em>surveillance with massive occlusions </em></strong>(in a crowd, with walls, trees, ...) where standard object tracking would be limited, <strong><em>video editing</em></strong>, where for example, we want to remove an object not from one frame, but from an entire scene, <strong><em>sports analytics</em></strong>, entirely based on motion, <strong><em>cell tracking</em></strong> (for example, division of cells, which can only be seen via videos), <em><strong>shoplifting</strong> <strong>detection</strong></em> (which can&apos;t really be seen in an image), <strong><em>fire spreading</em></strong>, and more...</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/top-ezgif.com-optiwebp.webp" class="kg-image" alt="Video Segmentation: Why the shift from image to video processing is essential in Computer Vision" loading="lazy" width="294" height="224"><figcaption><span style="white-space: pre-wrap;">Examples of Video Segmentation when image isn&apos;t sufficient</span></figcaption></figure><p>You can see this article for the normal <a href="https://www.thinkautonomous.ai/blog/image-segmentation-use-cases/" rel="noopener noreferrer"><strong>image segmentation use cases</strong></a>, and I highly recommend you augment it in your mind with these video examples I provided. So as a rule:</p><ul><li>For most cases, don&apos;t replace all your image segmentation pipelines with video pipelines</li><li>But for the cases where segmentation fails because you need to understand video, do it!</li></ul><p>Alright, we&apos;ve seen a lot, let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>Congratulations on getting so far! Let&apos;s summarize what we learned:</p><ul><li><strong>In many cases, analyzing one event fails</strong>. When video is essential, you have to use Video Computer Vision models.</li><li><strong>Video segmentation is segmentation applied to video processing</strong>, it&apos;s used in in various fields like surveillance, retail, sports analysis, shoplifting detection (or detecting suspicious behavior of any kind) and healthcare.</li><li><strong>Video Segmentation splits into two categories</strong>: Video Object Segmentation and Video Semantic Segmentation.</li><li><strong>Video Object Segmentation (VOS) focuses on tracking defined objects across video frames.</strong> Many applications like SAM2 are semi-supervised, because you give the model a prompt and an initial object to track.</li><li><strong>Video Semantic Segmentation focuses on pixel-level scene segmentation</strong>, it can also be instance or panoptic based, and the output may resemble the one of standard image segmentation.</li><li><strong>Some models like VisTR can be 100% video processing based</strong>. This model uses transformers for video instance segmentation.</li><li><strong>Other models can process frames one by one</strong>, but rely on a memory bank. In the case of SAM2, frames are processes both as a video and one by one (to keep track of a same object)</li></ul><h3 id="next-steps">Next Steps</h3><p>A few articles you can read:</p><ul><li><a href="https://www.thinkautonomous.ai/blog/computer-vision-from-image-to-video-analysis/" rel="noreferrer"><strong>Introduction to Video Processing</strong></a> &#x2014;&#xA0;an old post (you can see my writing style is much different) but good overview on Video Processing.</li><li><a href="https://www.thinkautonomous.ai/blog/object-tracking/" rel="noreferrer"><strong>A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars</strong></a><strong> -</strong> very related to video object processing, but without the segmentation part (bounding boxes).</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about vide computer vision</strong></b>... I have an App full of Computer Vision models and videos. Inside, I&apos;m showing you how to do lane detection, how Waymo&apos;s algorithms work (self-driving cars), and a lot more!<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of advanced Computer Vision content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div>]]></content:encoded></item><item><title><![CDATA[Functional Safety Engineer: The Job that 'certifies' self-driving cars]]></title><description><![CDATA[What is functional safety in self-driving cars? What does a functional safety engineer do? In this post, we'll try to understand how to certify a self-driving car code, and make it safe to drive in the streets]]></description><link>https://www.thinkautonomous.ai/blog/functional-safety/</link><guid isPermaLink="false">67a0a0a55b2944097abedb32</guid><category><![CDATA[self-driving cars]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 04 Feb 2025 19:46:04 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/functional-safety.webp" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars"><p><strong>In 2019, I was an Autonomous Shuttle Engineer, working for a company that got a thrilling opportunity: </strong>to equip Paris&apos; transportation system with our autonomous shuttles. This was a golden opportunity many don&apos;t have, but the client was known to be ruthless selectors. Many others perished while trying to be &quot;approved&quot;.</p><p><strong>With high hopes, our team prepared for the demo day for months.</strong> We meticulously reviewed the client&apos;s 100+-point checklist, ensuring our shuttle met all requirements from real-time operations to autonomy measures. One day, a team of 5 was called to begin process in a secret underground site. It was going to begin.</p><p><strong>The experimentation lasted days, in which each of the items were reviewed. Came the final test: Cyber-Security.</strong> The client made a phone call, and within 30 seconds, an engineer with a thinkpad came and entered the shuttle. &quot;Oh great! We can charge our phones!&quot; He said amused. &quot;What a mistake!&quot;. My colleagues were <u>sweating</u>, horrified at the vision of what this young men could do.... and they were right: In just five minutes, using only a USB stick, he had taken control of the vehicle, got it to drive all across the room. The room went silent, as everyone realized our chance had slipped away.</p><p>Checkmate.</p><p><strong>Many engineers join the self-driving car world for the same reasons I did</strong>: it&apos;s exciting, it&apos;s interesting, it&apos;s a passionating, it&apos;s impacting, it&apos;s just... wow. Yet, nearly all the engineers who are on the &quot;learning&quot; group and have never joined a real self-driving car company yet have absolute zero vision on what it takes to certify a vehicle. We could talk cyber security, but even at the automotive level, the software level, and more...</p><p>So in this post, I will try to sensibilize you to the concept of safety, from an autonomous tech engineer point of view. This means&#xA0;&#x2014; this post won&apos;t be for expert functional safety engineers, but for those who want an introduction.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text">Speaking of safety, one of the most vital elements in a safety system is <b><strong style="white-space: pre-wrap;">redundancy</strong></b>. This article does not focus on it, but I built a full video breaking down <b><strong style="white-space: pre-wrap;">Mobileye&apos;s redundancy system to achieve functional safety</strong></b>. It&apos;s only for those subscribed to my private emails. <b><strong style="white-space: pre-wrap;">Go </strong></b><a href="https://edgeneers.thinkautonomous.ai/posts/content-library-updates-mobileyes-true-redundancy-system" target="_blank" rel="noopener noreferrer"><u><b><strong class="underline" style="white-space: pre-wrap;">here</strong></b></u></a><b><strong style="white-space: pre-wrap;"> to get access!</strong></b></div></div><p>Let&apos;s begin with the fundamentals:</p><h2 id="what-is-functional-safety">What is Functional Safety?</h2><p><strong>Functional Safety is about making sure machines and systems stay safe, even if something goes wrong</strong>. For example, in self-driving cars, it means making sure the car can still drive safely if a part stops working. It can mean verifying that an algorithm works under all conditions, but also that it&apos;s never going to crash, and that if it does, the system has a backup.</p><p>To make it work, we use functional safety standards that determine what is safe to include in a self-driving car by evaluating the potential risks associated with each function and scenario. You can therefore understand the entire point of functional safety:</p><p><strong><u>To reduce risk to an acceptable level</u>.</strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1706" height="780" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-14.26.16--1-.jpg 1706w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The goal of functional safety is to make sure autonomous cars are at an acceptable level of risk</span></figcaption></figure><p>Okay, but this shouldn&apos;t be your job, right? It&apos;s someone else&apos;s problem! So you may wonder...</p><h2 id="why-should-i-bother-learning-about-functional-safety">Why should I bother learning about Functional Safety?</h2><p><strong>Let&apos;s say you decide to build an autonomous tech startup and run your algorithms</strong>. Some are open source, some are designed by you. You decide that these are good algorithms, the accuracy is near perfect, and you&apos;re a brutal C++ coder. There is no way you missed anything. Let&apos;s even pretend you really ARE a super-hero and really, the system is perfect...</p><p><strong>You convinced me... but can you convince recruiters? Or your management? Or the suits giving your startup a self-driving permit? </strong>Hey &#x2014; you can&apos;t test without the permit. No matter how good your system looks, you will need to convince the state to deliver you a permit. It can be the State of California, or the Ministry of Transport, or whoever delivers authorizations.</p><p><strong>The problem?</strong> They are NOT experts in safety or self-driving cars. So they will ask you to go via independent organizations, who run functional safety certification programs. Organisms like <em>T&#xDC;V Rheinland</em> and<em> T&#xDC;V SUD</em> (Germany) are the ones &apos;certifying&apos; you. They&apos;re verifying your safety functions, even the safety critical functions (emergency braking), and doing all kinds of silly tests before issuing you the certification.</p><p>Their job is to verify you are compliant with the industry norms.</p><p>But which norms are we talking about?</p><h2 id="what-are-the-different-functional-safety-norms-used">What are the different functional safety norms used?</h2><p><strong>When we say we want to &quot;reduce risk to an acceptable level&quot;... What is an acceptable level?</strong> Are you the one defining it? If an object detector works at 95%... is this okay? No? Yes? Who defines it? If your blinkers fail once every 300,000 miles... is this fine? Or is it every 3 millions miles?</p><p><strong>You can&apos;t be the deciding entity, this is what norms and industry standards are for. </strong>For example, ISO 26262 is a norm. It&apos;s focusing on <u>electronics</u> (buttons, A/C, windows, sensors, computers, ...), and defines a complete process to develop &amp; test your cars. It also tells you how to test scenarios, how to grade the risk of any event, and how to reduce that risk.<br><br>Let me share some norms we use in the industry:</p><ul><li>&#x2705; <strong>ISO-26262 is the norm that focuses on <u>failures</u> in electronic and software systems.</strong>&#xA0;It&apos;s going to deal with the question &quot;What happens if the object detector crashes mid-drive? Is there any backup?&quot;&#x200B;&#x200B;&#x200B;&#x200B;&#x200B;&#x200B; Based on how your system is implemented, you will comply more or less with the norm.</li><li>&#x2705; <strong>ISO-21448 <u>verifies</u> the&#xA0;Safety of the Intended Function (SOTIF)</strong>. It ensures perception systems like <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer"><strong>LiDAR</strong></a>, cameras, and <a href="https://www.thinkautonomous.ai/blog/faster-rcnn/" rel="noopener noreferrer"><strong>object detection</strong></a> perform safely in all conditions<strong>.</strong>&#xA0;&quot;Is your object detector working on all pedestrians? Really? Even in the dark?&quot;</li><li>&#x2705; <strong>ISO-21434</strong> <strong>is the norm focused on <u>cyber-security</u> of the system</strong>. It solves my USB-stick story. And it tells you everything you need to do to ensure your model is free from cyber attacks.</li><li>&#x2705; <strong>A-SPICE is focused on how your project is <u>coded</u>, tested, and maintained.</strong>&#xA0;This means the requirements, the modular and maintainable code, the coding standards &amp; reviews, the software testing, software versions and revisions, bug fixing, lifecycle of the product, etc...</li><li>&#x2705; <strong>UNECE WP.29 Regulations is the <u>compliance</u> with EU autonomous driving laws</strong>. You need at least this one to be allowed to drive autonomously.</li><li>and more... depending on what you want to certify.</li></ul><p>While these are not mandatory, the more of these norms you check, the safer you&apos;ll look. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about self-driving cars in production</strong></b>... I am doing a full breakdown of Mobileye&apos;s True Redundancy System. Inside, I&apos;m showing you all the different algorithms they test, how their safety guardian fallback works, and discuss their End-To-End algorithm.<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of self-driving car content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div><p>So comes a question:</p><h2 id="how-to-know-if-your-robot-complies-with-functional-safety-norms">How to know if your robot complies with Functional Safety norms?</h2><p>There are TONS of ways to do this, and it&apos;s really a profession, but let me share with you 2 important functional safety concepts:</p><ol><li>The V-Model</li><li>The Functional Safety Process to &quot;certify&quot; a function</li></ol><h3 id="the-v-model">The V-Model</h3><p><strong>The V-Model is a widely used framework in functional safety management and software development</strong>. You will find it when trying to comply with ISO26262, but also A-SPICE for example. It is structured like a &quot;V,&quot; where the left side represents the concepts/requirements/design phase, the bottom part is the coding phase, and the right side corresponds to the validation/integration/testing phase.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/image-3--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1912" height="1286" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/image-3--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/image-3--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/image-3--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/image-3--1-.jpg 1912w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The V-Model is heavily used across all industries in software</span></figcaption></figure><p><strong>You can see it as a continuous process,</strong> where you continuously verify that your system behaves as intended in the concept phase. If not, you rework it. It&apos;s evolving, it&apos;s alive, promoting a systematic approach to achieving functional safety in safety related systems.</p><p>In most companies that seriously want to comply with the ISO norms and get the functional safety accreditation, using the V-Model is the best starting point.</p><p>Next:</p><h3 id="the-functional-safety-process-to-certify-a-function">The Functional Safety Process to &quot;certify&quot; a function</h3><p>As we said, we have ISO26262 focusing on electronics, SOTIF focusing on algorithms, and A-SPICE focusing on code/software. Each of these is using the V-Model. Then, to comply with these norms, you&apos;ll need a &quot;process&quot;. This means defining clearly what each of these phases are.</p><p>Here is a 7-Step process:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1878" height="812" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-15.04.07--1-.jpg 1878w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The 7 Steps to make a system compliant to ISO norms</span></figcaption></figure><p><strong>The job of a functional safety engineer is to implement this.</strong> This is the &quot;bridge&quot; between systems and production I was telling you about earlier.</p><p>Let me briefly define each element: (credit to a client of Think Autonomous named <a href="https://www.linkedin.com/in/mayur-waghchoure-a5aba5ab/" rel="noopener noreferrer"><strong>Mayur Wagchoure</strong></a> for helping me write this one)</p><h4 id="1-define-the-system"><strong>1. Define the System</strong></h4><p>First, we want to define the system we&apos;re testing. For example, <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer"><strong>lane detection</strong></a>. We want to define the purpose, the scope, the dependencies, and even the normal and edge cases.</p><h4 id="2-hara-hazard-analysis-and-risk-assessment"><strong>2. HARA: Hazard Analysis and Risk Assessment</strong></h4><p>The second point is HARA, in which we want to do:</p><ul><li><strong>HA &#x2014;&#xA0;H</strong>azard <strong>A</strong>nalysis (what could go wrong?)</li><li><strong>RA &#xA0;&#x2014; R</strong>isk <strong>A</strong>ssessment<strong> </strong>(how bad would that be if it went wrong?)</li></ul><p><em>Hazard Analysis</em></p><p>If you want to comply with functional safety standards, the first thing you&apos;ll need to do is account for the different scenarios. I see them into 4 main sections: <strong><em>Car Status</em></strong>, <strong><em>Scenario</em></strong>, <strong><em>Environment</em></strong>,<strong><em> Driving Status.</em></strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/image--2-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="2000" height="976" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/image--2-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/image--2-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/image--2-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/image--2-.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Example of all possible environments your car may be in (this may vary based on your testing site)</span></figcaption></figure><p>Your car could be turned on, driving in a country road, with rainy conditions, and driving at low speed. Or you could drive at high speed, and accelerate. Or suddenly brake. Or drive in dry roads. Or wet roads. Putting categories into each of these is a way to avoid the summer/winter rookie mistake.</p><p><em>Risk Assessment</em></p><p>To &quot;grade&quot; each function, you then use the formula defined by ISO26262: <strong>Risk = Severity * Exposure * Controllability.</strong></p><p>For example:</p><ul><li>I am testing the emergency braking function, and the risk that it doesn&apos;t activate (Severity = S3)</li><li>I&apos;m driving in urban environment, at 30-60 km/h, which happens all the time (Exposure = E4)</li><li>Urban areas have many pedestrians, it&apos;s very hard to control (Controllability = C3)</li></ul><p>Then what?</p><p><strong>The ISO26262 provides what&apos;s called the ASIL (Automotive Safety Integrity Level) Table</strong>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.aptiv.com/images/default-source/feature-stories/asil-diagram-v01.png?sfvrsn=d47cbf3e_4" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="2442" height="1537"><figcaption><span style="white-space: pre-wrap;">The ASIL Table &#x2014;&#xA0;This attributes a grading based on your Severity, Exposure, and Controllability. If you have C1, E1, S1, it means you don&apos;t need to go through millions of tests.</span></figcaption></figure><p>I am NOT going to describe how we do in this article, but the &quot;RA&quot; phase is about assigning, for every single function and every single scenario, what&apos;s called an <em>ASIL</em> level. These can be A (safe), B (safe), C (risky), or D (risky). We&apos;re trying to see, for each function, is it risky or safe?</p><p>For example:</p><p>If you&apos;re testing an emergency braking system, in a highway scenario, with wet road, snow, and fog... you can imagine it&apos;s an ASIL-D score. Now if you&apos;re on the same scenario, but testing the radio, it&apos;s probably A or B.</p><h4 id="3-set-safety-goals"><strong>3. Set Safety Goals</strong></h4><p><strong>From every potential hazard and risk we have, we want to turn this into a safety goal.</strong> Basically, turn the failure into an opportunity to design a better system. If I have just one LiDAR, and it&apos;s working bad under snow, could I have a better <a href="https://www.thinkautonomous.ai/blog/lidar-and-camera-sensor-fusion-in-self-driving-cars/" rel="noopener noreferrer">LiDAR and a camera</a> instead?</p><p><strong>Here, we will create a list of requirements for the new system</strong>. It&apos;s still the &quot;concept&quot; phase, where we identify the breaking points, and turn this into a better solution.This is the <u>work</u> where you try to think about reducing risk to an acceptable level.</p><h4 id="4-functional-safety-analysis"><strong>4. Functional Safety Analysis</strong></h4><p>Then, we implement things like <strong>FMEA (Failure Mode and Effects Analysis)</strong>&#xA0;to assess potential failure causes, effects, and mitigation strategies. We can also run <strong>FTA (Fault Tree Analysis)</strong>&#xA0;to explore how faults propagate and lead to hazards. We want to identify all causes of errors.</p><h4 id="5-design-safety-mechanisms"><strong>5. Design Safety Mechanisms</strong></h4><p>Then, we&apos;re introducing mechanisms to detect, isolate, or prevent failures (e.g., redundancy, diagnostics, fail-safe systems). This can be watchdog timers, dual-channel systems, degraded operational modes, ...</p><p><strong>For example, one of the Functional Safety Methods is to implement redundancy.</strong> If you have an ASIL-D component (unsafe); you could turn it into 2 ASIL-B ones (somewhat safe). This way, your overall ASIL score is better, and you become compliant.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1610" height="704" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.01.48.jpg 1610w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A Functional Safety task called ASIL Decomposition used to decrease risk</span></figcaption></figure><p>In this example, we could imagine that the second LiDAR is different, or that the algorithms behind it are more &quot;deterministic&quot;, don&apos;t use AI, and therefore are safer. The goal of functional safety is to try and reduce as many components to ASIL-A and ASIL-B as possible. &gt;&gt;&gt; This is the acceptable level.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F2;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">But how do the companies </strong></b><i><b><strong class="italic" style="white-space: pre-wrap;">that actually deploy vehicles</strong></b></i><b><strong style="white-space: pre-wrap;"> solve this?</strong></b> I interviewed LOXO, a Swiss startup deploying fully autonomous delivery robots powered by End-to-End Learning. Interested? <b><strong style="white-space: pre-wrap;">It&apos;s in this </strong></b><a href="https://www.linkedin.com/posts/jeremycohen2626_selfdrivingcars-robotics-deeplearning-activity-7295048405435727872-ULjK/?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAA1gjMgB2UeumB1uFo-it1cN7J4OxYZJIDI" target="_blank" rel="noopener noreferrer"><u><b><strong class="underline" style="white-space: pre-wrap;">post</strong></b></u></a><b><strong style="white-space: pre-wrap;">.</strong></b></div></div><h4 id="6-validation-and-verification"><strong>6. Validation and Verification</strong></h4><p>How do we test? This can be field tests, but also simulations, hardware-in-the-loop (HIL), and fault injection testing. You can also here test the Safety of Intended Functionality (SOtIF) &#x2014; how performant is your algorithm? Is it really THAT good?</p><p>Finally:</p><h4 id="7-iterate-validate-and-document"><strong>7. Iterate, Validate, and Document</strong></h4><p>You want to iterate, improve, and document your safety analysis results. In the end, it&apos;s a very technical job, but that has a lot of paperwork, documentation, diagrams, schematics, grading, because these are the papers giving you authorizations.</p><p>We have now seen:</p><ul><li>What is functional safety?</li><li>What are the different norms we should comply with?</li><li>How do we comply with these norms (overview)</li></ul><p>Let&apos;s see an example:</p><h2 id="example-mobileyes-primary-guardian-fallback-true-redundancy-system">Example: Mobileye&apos;s Primary Guardian Fallback / &quot;True Redundancy&quot; System</h2><p><a href="https://www.thinkautonomous.ai/blog/mobileye-end-to-end/" rel="noopener noreferrer"><strong>Mobileye</strong></a><strong>, Intel&apos;s self-driving car company, is has a very strong functional safety focus</strong>. Their algorithm has 3 distinct channels that are completely different:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg" class="kg-image" alt="Functional Safety Engineer: The Job that &apos;certifies&apos; self-driving cars" loading="lazy" width="1892" height="752" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/02/Screenshot-2025-02-04-at-16.06.59--1-.jpg 1892w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Mobileye&apos;s True Redundancy System (</span><a href="https://www.thinkautonomous.ai/sdc-app" rel="noreferrer"><span style="white-space: pre-wrap;">you can learn more by looking the full video in my app &#x2014; available when you join my daily emails</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>The lane detection is the <u>main</u> channel used to fine lane lines</strong>. This can work for example with <a href="https://www.thinkautonomous.ai/blog/lane-detection/" rel="noopener noreferrer">modular deep lane detection</a>. It is <strong><u>verified</u></strong> with <a href="https://www.thinkautonomous.ai/blog/robot-mapping/" rel="noopener noreferrer"><strong>HD Map</strong></a> Extraction &amp; Localization. If they agree, then we&apos;re good, but if they don&apos;t, they&apos;ll extract the lanes from a parallel <a href="https://www.thinkautonomous.ai/blog/tesla-end-to-end-deep-learning/" rel="noopener noreferrer"><strong>end-to-end deep learning</strong></a> algorithm that will act as the &quot;judge&quot; or guardian.</p><p><strong>Do you realize how many algorithms are running in parallel? </strong>They implemented these automatic protection functions in case of failure. They also implemented these safety requirements across the entire system, meaning the electronic systems, the software components, and so on...</p><p><strong>When doing something like this, it&apos;s very important that each function is run using a separate method</strong>, possibly with a separate computer, separate sensors, etc... so that there cannot be a single point of failure (for example, if everything uses the same camera, and this one fails, it&apos;s not functionally safe).</p><h2 id="wait-does-everybody-really-do-all-of-this">Wait... Does everybody really do all of this?</h2><p>No.</p><p><strong>In fact, many startups don&apos;t have a functional safety team</strong>, <strong>or even have a safety system in place</strong>. In this case, they try to do it in the safety critical systems, while waiting for the certification process. Some are also in a more favorable state/country that gives permits more easily (to enhance innovation and startups work on the technology).</p><p><strong>It&apos;s important to understand that complying with ISO norms is NOT mandatory</strong>. In the European Union, you need to comply with the UNECE WP.29 Regulations (traffic laws) but I don&apos;t think the ISO norms are mandatory.</p><p><strong>In fact, Tesla doesn&apos;t comply with the norms, and they are approved to drive in the streets</strong>. They sell cars, and they even sell autonomous cars all across the world. But you&apos;ll note that some of their functions, like FSD (Full Self-Driving) are currently (early 2025) NOT authorized everywhere, like in Europe, because they don&apos;t comply with all the norms.</p><p>Okay, okay, I think we have ENOUGH! Let&apos;s do a summary...</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><ul><li><strong>Functional safety makes sure robots and algorithms operate safely</strong>, even when something goes wrong, by reducing risks to an acceptable level.</li><li><strong>Every engineer working in the field should be introduced to safety. </strong>This defines how you code, but also whether a startup gets authorzations to drive or not.</li><li><strong>Key functional safety norms include ISO 26262</strong> for electronics, <strong>ISO 21448</strong> for algorithms, <strong>ISO</strong> <strong>21434</strong> for cybersecurity, and <strong>UNECE WP.29 </strong>for EU compliance.</li><li><strong>The V-Model is a structured approach in functional safety management,</strong> covering concept, coding, and validation phases to achieve compliance. It has a V shape doing Conception - Coding - Testing.</li><li><strong>Functional Safety is a 7-step process includes defining systems</strong>, hazard analysis, setting safety goals, and implementing safety mechanisms to ensure compliance.</li><li><strong>The ISO26262 norm defines risks as <em>Exposure</em> <em>* Severity </em>* <em>Controllability</em><em>.</em></strong> An ASIL table then defines for each function, which grade it has.</li><li><strong>When something is risky (ASIL-C, ASIL-D),</strong> we introduce redundancy, diagnostics, and fail-safe systems to detect, isolate, or prevent failures, enhancing the overall safety integrity level.</li><li><strong>We want to test through simulations</strong>, field tests, and fault injection to ensure safety functions perform under all conditions, meeting the required safety standards.</li></ul><p>Alright, I think we are good!</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4F1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about self-driving cars in production</strong></b>... I am doing a full breakdown of Mobileye&apos;s True Redundancy System. Inside, I&apos;m showing you all the different algorithms they test, how their safety guardian fallback works, and discuss their End-To-End algorithm.<br><br><a href="https://www.thinkautonomous.ai/sdc-app/" rel="noreferrer">It&apos;s all in my App, along with 5+ hours of self-driving car content &#x2014; available when you join my daily emails. Here is where you can learn more.</a></div></div>]]></content:encoded></item><item><title><![CDATA[Faster RCNN in 2025: How it works and why it's still the benchmark for Object Detection]]></title><description><![CDATA[A decade after its release. Faster RCNN is still the ruling king, used in every single paper as the benchmark for object detection. So how does it work? What is behind the Faster RCNN algorithm? Let's find out...]]></description><link>https://www.thinkautonomous.ai/blog/faster-rcnn/</link><guid isPermaLink="false">679757485b2944097abedac1</guid><category><![CDATA[deep learning]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Mon, 27 Jan 2025 12:00:02 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/01/faster-rcnn.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/faster-rcnn.jpg" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection"><p><strong>A day in 2018, I was buying my first car, an Audi A1</strong>, when the sales representative pitched me about the &quot;sportback&quot; version;&#xA0;which was supposedly more powerful. &quot;Why?&quot; I asked. And it turns out, it had more <strong><em>horsepower</em></strong> than the classic version. Horsepower? The idea intrigued me, especially since it&apos;s been a century since people replaced horses with cars, and yet, we still use horsepower as the key metric to describe a car.</p><p><strong>This metric, while seeming outdated, is still used as the gold standard in automotive... and it reminds me very much of the Faster-RCNN algorithm in object detection.</strong></p><p>The Faster RCNN algorithm got introduced to the AI community in 2015, and even though it&apos;s been 10+ years now, you still see it listed as the benchmark in most new object detection papers. Somehow, Faster RCNN is still the reference researchers use when they create a new algorithm.</p><p>Take, for example, the paper CO-DETR, which is doing Object Detection with Hybrid Transformers, something super-advanced, released late 2023 (almost 10 years after Faster RCNN), and notice the papers it&apos;s being compared to: <u>Faster RCNN is part of the list</u>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-4.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1156" height="826" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-4.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-4.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-4.jpg 1156w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Faster RCNN algorithm is still the benchmark in most object detection papers, even 10 years after its released</span></figcaption></figure><p>Have you noticed? And it&apos;s the case for almost every paper!</p><p><strong>Why?</strong> Back when the algorithm got first released, and somewhere around 2018, I was looking for an object detection model to integrate in my autonomous shuttle, and it seemed that the entire market came to 3 conclusions:</p><ul><li>SSD (Single Shot Detector) is the fastest object detection network</li><li>Faster R-CNN is the best model for accuracy, especially with small objects</li><li>YOLO (You Only Look Once) is the best tradeoff between accuracy and speed</li></ul><p>Yet, the original YOLOv3 got replaced several times, and the Faster R CNN model still continued to live. In this article, I&apos;d like to describe the model to you, explain its key components, and help you understand whether you should spend time on it or not.</p><p>Before we dive into this algorithm, a quick aside:</p><p><strong>When I first learned about object detection</strong>, it was through the Udacity Self-Driving Car Nanodegree where we learned a Machine Learning technique called &apos;HOG+SVM&apos;, which worked like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-7.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="531" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-7.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-7.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-7.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-7.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Traditional &quot;HOG+SVM&quot; object detection (more info on how it works </span><a href="https://www.thinkautonomous.ai/blog/computer-vision-self-driving-cars-introduction/" rel="noreferrer"><span style="white-space: pre-wrap;">here</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>The image was sent to an algorithm that ran a sliding window</strong>, and for each window, extracted Histogram of Oriented Gradient features that it classified using a Support Vector Machine (SVM). It was old, fully &quot;traditional&quot;, but somehow worked. I doesn&apos;t win an object detection oscar, but it did work. The idea was what we call a two-stage object detector:</p><ol><li>We propose regions or bounding boxes (in this case, we defined window dimensions)</li><li>We classify each region</li></ol><p>And here was the output:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Udacity-VehicleDetectionandTracking-ezgif.com-optimize.gif" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="512" height="288"><figcaption><span style="white-space: pre-wrap;">Output of an ML based classifier</span></figcaption></figure><p>The Faster-RCNN algorithm is built exactly on the same &quot;Two Stage&quot; principle, except that it replaces every single one of these techniques by Neural Networks.</p><p>Let&apos;s see how:</p><h2 id="r-cnn-selective-search">R-CNN: Selective Search</h2><p>The first idea was to replace Feature Extraction, which was done using Histogram of Oriented Gradients, with a CNN (Convolutional Neural Network). Here it how it worked:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-5.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1822" height="632" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-5.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-5.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-5.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-5.jpg 1822w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Added in R-CNN: CNNs instead of HOG features! (taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>The algorithm had a few steps:</p><ol><li><strong>Propose 2,000+ Regions</strong> using <a href="https://learnopencv.com/selective-search-for-object-detection-cpp-python/#:~:text=Selective%20Search%20is%20a%20region,texture%2C%20size%20and%20shape%20compatibility." rel="noopener noreferrer"><strong>Selective Search Algorithm</strong></a></li><li><strong>For each region, extract features</strong> with a CNN (Convolutional Neural Network)</li><li><strong>For each region, classify the features </strong>using SVM (Support Vector Machine).</li></ol><p>The idea was almost similar to my project, except that the region proposal network was done using Selective Search, an old Computer Vision algorithm, and the extraction was done using CNNs.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-6--1-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1988" height="1332" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-6--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-6--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-6--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-6--1-.jpg 1988w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The old school Selective Search algorithm</span></figcaption></figure><p><strong>The algorithm had several problems</strong>: too many useless regions, too much extraction to do, and every region had to be resized/rewarped manually to match the CNN input layer.</p><p>This wasn&apos;t ideal...</p><h2 id="spp-net-adding-a-spatial-pyramid-pooling-spp-block">SPP-Net: Adding a Spatial Pyramid Pooling (SPP) block</h2><p>SPP-Net is an evolution of this paper using a clever technique called Spatial Pyramid Pooling. The idea was as follows:</p><ol><li>Extract the Features using a CNN <u>first</u></li><li>Propose 2,000+ feature map Regions using Selective Search.</li><li><strong>Use Spatial Pyramid Pooling to avoid cropping/warping regions</strong></li><li>Send each feature map to FC layers and classify using SVM</li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><a href="https://courses.thinkautonomous.ai/obstacle-tracking"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image--1-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="532" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image--1-.jpg 2000w" sizes="(min-width: 720px) 720px"></a><figcaption><span style="white-space: pre-wrap;">Adding Spatial Pyramids to see at multiple scales (taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>And the idea worked! Working on the features instead of the regions helped remove some noise, and introducing Spatial Pyramid Pooling helped look at the image from different aspect ratios using multiple &apos;Max Pooling&apos; operations.</p><p><strong>Let me briefly take you back to the Max Pooling idea</strong>: it takes a window (say 2x2) and computes the maximum to reduce the size of the input, so that a 500x500 images becomes 250x250.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image--2-.png" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1570" height="348" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image--2-.png 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image--2-.png 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image--2-.png 1570w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">How Max Pooling works</span></figcaption></figure><p>Spatial Pyramid Pooling is doing a similar thing, but at multiple different scales, like (1x1), (2x2), (3x3), etc:</p><figure class="kg-card kg-image-card kg-card-hascaption"><a href="https://courses.thinkautonomous.ai/obstacle-tracking"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image--3-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1804" height="530" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image--3-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image--3-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image--3-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image--3-.jpg 1804w" sizes="(min-width: 720px) 720px"></a><figcaption><span style="white-space: pre-wrap;">(taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>The idea: <u>collect information from different scales</u>.</p><p>So far, we replaced the Feature Extraction with a CNN, and we added an SPP. What now?</p><h2 id="fast-r-cnn-adding-neural-network-classification-roi-pooling">Fast R-CNN: Adding Neural Network Classification &amp; ROI Pooling</h2><p>Comes the Fast R-CNN detector! And the idea is almost the same, except that it replaces the Spatial Pyramid Pooling with ROI-Pooling; and the final SVM with a Multi-Layer Perception classifier:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2022-03-16-at-13.03.59.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="572" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2022-03-16-at-13.03.59.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2022-03-16-at-13.03.59.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2022-03-16-at-13.03.59.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2022-03-16-at-13.03.59.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Adding FC Classifiers &amp; ROI Pooling</span></figcaption></figure><p>Notice the key steps:</p><ol><li>Extract the Features using a CNN</li><li>Propose 2,000+ Regions using Selective Search.</li><li><strong>Use ROI Pooling to avoid cropping/warping regions</strong></li><li><strong>Send this to FC layers and classify using a neural network</strong></li></ol><p>There are two ideas of the Fast R-CNN architecture: ROI Pooling &amp; FC Classification.</p><h3 id="1-roi-pooling-spp-in-better">1. ROI Pooling: SPP in better</h3><p>ROI Pooling is a special case of Spatial Pyramid Pooling, with an added idea of focusing on a specific region.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image--4-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1684" height="402" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image--4-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image--4-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image--4-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image--4-.jpg 1684w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">(taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>This is particularly useful in two-stage object detection algorithms</strong> that first propose regions using algorithms such as Selective Search or as in Faster RCNN, Region Proposal Networks. This last technique is also super fast compared to SPP that computes pooling several times at different scales (it also computes pooling for all regions directly).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image--5-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1396" height="724" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image--5-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image--5-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image--5-.jpg 1396w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">(taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>One thing that you can notice about this technique is that by working on different scales, it allows the network to be more accurate, especially with different sizes of objects</strong>. Many objects detection models use anchor box techniques to find bounding boxes. The problem is, you must manually define the anchor boxes every time. In this process, we find the regions first (rather than some boxes), and then extract information about these regions.</p><p>We can see this on the Faster-RCNN paper (which uses this same technique) as well:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-5-1.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="439" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-5-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-5-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-5-1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-5-1.jpg 2000w" sizes="(min-width: 720px) 720px"></figure><h3 id="2-fully-connected-classifier-svm-in-better">2. Fully Connected Classifier: SVM in better</h3><p>The second addition is to replace the traditional Machine Learning SVM (Support Vector Machine) with a Softmax layer. There isn&apos;t much to comment here &#x2014; we&apos;re using softmax on the k region proposals to do object classification for each bounding box. And this is an idea used in Fast R-CNN, but also in Faster-R-CNN.</p><p>Speaking of it, there is one last evolution to go from Fast R-CNN to Faster R-CNN:</p><h2 id="faster-r-cnn-replacing-selective-search-with-a-region-proposal-network-rpn">Faster R-CNN: Replacing Selective Search with a Region Proposal Network (RPN)</h2><p>When we started the algorithm in 2013, we had almost everything done by traditional techniques:</p><ul><li>We <strong>proposed</strong> <strong>regions</strong> using Selective Search (old school computer vision/segmentation)</li><li>We <strong>did</strong> <strong>feature</strong> <strong>extraction</strong> with CNNs (this was Deep Learning)</li><li>We <strong>classified</strong> the features using SVM (traditional machine learning classification)</li></ul><p>And progressively, we replaced SVM with a Fully-Connected Layer, and we improved the CNN extraction with pyramids. What is left to replace with Deep Learning? The Selective Search algorithm! And yes, it was really an old and slow Computer Vision technique:</p><p><strong>Faster R-CNN replaces selective search with a Deep Learning based Region proposal Network (RPN)</strong>. This way, everything is trained end to end using a unified network. It&apos;s all a single network!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2022-03-16-at-13.04.10.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="548" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2022-03-16-at-13.04.10.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2022-03-16-at-13.04.10.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2022-03-16-at-13.04.10.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2022-03-16-at-13.04.10.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Faster RCNN in 4 steps(taken from my course </span><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noreferrer"><span style="white-space: pre-wrap;">MASTER OBSTACLE TRACKING</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>The full process goes like this:</p><ol><li>Extract the Features using a CNN</li><li><strong>Generate Region Proposals using a Region Proposal Network.</strong></li><li>Use ROI Pooling to avoid cropping/warping regions</li><li>Send this to FC layers and classify using a neural network</li></ol><p>And here we are!</p><p><strong>Now what is this RPN doing?</strong> It serves as the &quot;attention&quot; of the network. It&apos;s designed to generate high quality region proposals and highlight where there might be <u>objects</u>. Taking an image of any size as input, it uses a fully convolutional network to output a set of rectangular bounding boxes, each with an objectness score. When you think about it, the name of the paper Faster-RCNN is &quot;Towards Real-Tme Object detection with Region Proposal Networks&quot;.</p><p><strong>How is a Region Proposal Network making it any real time?</strong> By removing the selective search algorithm and working on the feature maps directly, it kills the need for heavy computations, and does cost free region proposals. The RPN operates on the <strong>same convolutional feature maps</strong> produced by the backbone CNN (e.g., ResNet or VGG) that are already computed for object detection.</p><p><strong>It then uses the concept of &apos;</strong><a href="https://www.thinkautonomous.ai/blog/anchor-boxes/" rel="noopener noreferrer"><strong>anchor boxes</strong></a><strong>&apos; to do the region proposal generation</strong>. If you&apos;re not familiar with the concept of anchor boxes, the idea is to define boxes of multiple aspects and sizes, and try to have objects fit these boxes. For example, a vertical small anchor box is a pedestrian seen by far &#x2014;&#xA0;a vertical big anchor box is a pedestrian close to the camera.</p><p>I highly recommend my article &quot;<a href="https://www.thinkautonomous.ai/blog/anchor-boxes/" rel="noopener noreferrer"><strong><em>Finally Understand Anchor Boxes in Object Detection</em></strong></a>&quot; to grasp the idea.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-8--1-.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="2000" height="816" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-8--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-8--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-8--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-8--1-.jpg 2234w" sizes="(min-width: 720px) 720px"></figure><p>Using the FCN in Fast R-CNN and this new RPN, the algorithm can simultaneously predicts object bounds and class probabilities.</p><p>And we have it! The paper <strong><em>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks </em></strong>is really performant, because it uses all of these ideas. Being a 2-stage algorithm, it&apos;s powerful and can find small objects as well as big objects. It&apos;s a bit slow, but the enhancements made to the region proposal network are much better</p><p>Let&apos;s now see an example:</p><h2 id="is-faster-rcnn-still-used-where-would-it-fit-best">Is Faster-RCNN still used? Where would it fit best?</h2><p>I don&apos;t think people today would use Faster R-CNN as the main choice for their algorithm. There are many powerful, better, and faster object detectors. Yet, if there was one place where I&apos;d use it, it&apos;d be on the task of Traffic Light Detection &amp; Classification.</p><p><strong>You see this all the time; traffic lights have a separate network designed just for them</strong>. These networks are usually focused on finding smaller objects, detecting the states, and using the long-range camera focused on lights.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-27-at-12.24.11.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="1090" height="642" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-27-at-12.24.11.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-27-at-12.24.11.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-27-at-12.24.11.jpg 1090w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The Autoware </span><a href="https://www.thinkautonomous.ai/blog/autonomous-vehicle-architecture/" rel="noreferrer"><b><strong style="white-space: pre-wrap;">Autonomous Vehicle Architecture</strong></b></a><span style="white-space: pre-wrap;"> &#x2014; Here, Faster-RCNN would fit on the Traffic Light Detection &amp; Classifcation module</span></figcaption></figure><p><strong>Here, you will find Faster-RCNN possibly used</strong>. Probably more than anywhere else, because this algorithm is accurate and working well with small objects.</p><h2 id="should-i-try-to-recode-it-on-my-own">Should I try to recode it on my own?</h2><p>If you have never implemented object detection, I would recommend to first go to the HOG+SVM traditional techniques. Then, you could try to run Faster R-CNN first, and then maybe implement some bricks. Today, YOLO is more dominant, and understanding this algorithm would make a lot of sense too. </p><p>What&apos;s to understand is that the real going behind object detection is not necessarily to find objects, but rather to extend it to things like <a href="https://www.thinkautonomous.ai/blog/lidar-and-camera-sensor-fusion-in-self-driving-cars/" rel="noopener noreferrer"><strong>LiDAR/Camera Fusion</strong></a> or <a href="https://www.thinkautonomous.ai/blog/object-tracking/" rel="noopener noreferrer"><strong>object tracking</strong></a>. This is where the real value of object detectors is.</p><h2 id="summary-next-steps">Summary &amp; Next Steps</h2><p>You&apos;ve been through the article! Congratulations! Let&apos;s do a quick summary of everything we learned. First, you probably now understand this image:</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-27-at-12.51.37.jpg" class="kg-image" alt="Faster RCNN in 2025: How it works and why it&apos;s still the benchmark for Object Detection" loading="lazy" width="878" height="620" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-27-at-12.51.37.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-27-at-12.51.37.jpg 878w" sizes="(min-width: 720px) 720px"></figure><p>Here is what it was about:</p><ul><li><strong>Faster R-CNN remains a benchmark for object detection, even a decade after its introduction</strong>. The algorithm is known for its accuracy, especially with small objects, and is often compared to newer models like YOLO and SSD.</li><li><strong>Faster R-CNN evolved from traditional techniques</strong>, replacing each of them with neural networks; from region proposals, to feature extraction and classification.</li><li><strong>HOG feature extraction was replaced with CNNs </strong>in Fast-RCN, allowing to build feature maps.</li><li><strong>Spatial Pyramid Pooling, and later ROI Pooling </strong>and anchor boxes got added for better extraction, proposal, and understanding.</li><li><strong>Region Proposal Networks (RPN) replaced Selective Search</strong>, allowing the model to generate high-quality region proposals in real-time.</li><li><strong>A fully-connected </strong>layer<strong> </strong>replaced SVM for classification.</li><li><strong>Despite being slower than some modern detectors,</strong> Faster R-CNN excels in tasks requiring high accuracy, such as traffic light detection.</li><li><strong>The model&apos;s two-stage process involves proposing regions and classifying them</strong>, making it powerful for detecting objects of various sizes.</li></ul><h3 id="next-steps">Next Steps</h3><p>Here are a few articles I&apos;d recommend you learn next to continue your journey:</p><ul><li><a href="https://www.thinkautonomous.ai/blog/anchor-boxes/" rel="noreferrer"><strong>Finally Understand Anchor Boxes in Object Detection (2D and 3D)</strong></a></li><li><a href="https://www.thinkautonomous.ai/blog/computer-vision-for-tracking/" rel="noreferrer"><strong>Computer Vision for Multi-Object Tracking: Live Example</strong></a></li><li><a href="https://www.thinkautonomous.ai/blog/instance-segmentation/" rel="noreferrer"><strong>Instance Segmentation: How adding Masks improves Object Detection</strong></a></li></ul><p>And of course, the most important recommendation of all:</p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E5;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Signup for my daily emails</strong></b>, and get access to daily content like this one &#x2014; along with my App &quot;<i><em class="italic" style="white-space: pre-wrap;">Think Autonomous</em></i>&quot;, containing 5+ hours of content on Computer Vision &amp; Self-Driving cars. We cover what startups REALLY do, and help you become engineers in the autonomous tech industry.<br><a href="https://www.thinkautonomous.ai/private-emails?ref=thinkautonomous.ai">Subscribe here and join 10,000+ Engineers!</a></div></div>]]></content:encoded></item><item><title><![CDATA[A complete overview of Object Tracking Algorithms in Computer Vision & Self-Driving Cars]]></title><description><![CDATA[How does Object Tracking work? In this article, we'll go from intermediate to advanced, and dive into the different object tracking algorithms you have at disposal and how they work for self-driving cars]]></description><link>https://www.thinkautonomous.ai/blog/object-tracking/</link><guid isPermaLink="false">678ebb7c5b2944097abeda29</guid><category><![CDATA[computer vision]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Tue, 21 Jan 2025 14:47:10 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/01/object-tracking.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/object-tracking.webp" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars"><p><strong>Have you seen the movie &apos;Limitless&apos;?</strong> I really loved this movie when growing up. It&apos;s the story of Eddie Morra, a broke writer who suddenly gets access to the NZT drug, unlocking immense cognitive capacities. Using it, he&apos;s able to learn quicker, benefit enhanced memory, focus better, build charisma, and instantly has an amazing life.</p><p><strong>I love this movie, and in particular the idea of being &quot;limitless&quot;</strong>, which is similar to the movie Lucy where you start using your brain to 100% capacity, and thus fully exploit it. Tons of Computer Vision students today have the NZT pill in front of them, but refuse to take it. And by this, I mean they have great skills in image processing, and can detect objects and find 2D boxes, but are unable to fully exploit this skill to 100%.</p><p><strong>In order to fully exploit your Computer Vision skills, you&apos;d need object tracking</strong>, and this is exactly what we&apos;ll talk about in this article in 3 points:</p><ul><li><strong>Object Detection vs Object Tracking:</strong> Why you shouldn&apos;t be an Object Detection Engineer, and why Object Tracking Engineers bring more benefits</li><li><strong>Object Tracking Elements: </strong>How to track objects from frame to frame</li><li><strong>Advanced Object Tracking</strong>: 3D Tracking, Deep Tracking, and more...</li></ul><h2 id="object-detection-vs-object-tracking-why-you-shouldnt-be-an-object-detection-engineer-and-why-object-tracking-engineers-are-better"><strong>Object Detection vs Object Tracking: </strong>Why you shouldn&apos;t be an Object Detection Engineer, and why Object Tracking Engineers are better</h2><p>Back when I started in Computer Vision, I often stumble across courses where the curriculum went like this:</p><ol><li><strong>CV Fundamentals</strong>: Image Processing, Filtering, Resizing, Colorization, OpenCV, computer vision algorithms, etc...</li><li><strong>Neural Network &amp; CNNs</strong>: Neural Nets, MLP Classification, Backpropagation, CNNs Image Classification</li><li><strong>Advanced Computer Vision: </strong>2D Object Detection, CV Projects (car counting, ...), ...</li></ol><p>Somehow, no matter the course I picked &#x2014; and even at university &#x2014; <u>object detection was presented as the utmost cutting-edge project you could work on</u>. And somehow, almost 8 years later, it&apos;s still the case.</p><p><strong>When I worked in the self-driving car field, I realized object detection was merely a handful tool, and NEVER the end goal</strong>. No, knowing that a detected object is in a bounding box centered at pixel (200,300) isn&apos;t directly useful, and this because we&apos;re in the pixel space, but also because we have no information about these objects.</p><p><strong>This is something I explain in the presentation page of my </strong><a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noopener noreferrer"><strong>obstacle tracking course</strong></a>, but also in general every time I talk about object detection: <u>a 2D bounding box is NOT the end goal</u>. However, if you have information about that bounding box, it could be useful... For example:</p><ul><li>If you know how long each specific object has been in the frame</li><li>Or how fast they&apos;re going</li><li>Or where they&apos;re heading (even what action they&apos;ll take)</li></ul><p>Then it becomes useful, and can be the origin of many real-world AI applications in retail, self-driving cars, robots, traffic monitoring, sports analytics, visual object tracking, and more..</p><blockquote class="kg-blockquote-alt"><strong>Computer Vision might be represented by the YOLO algorithm and object detection, but the reality is, skills like Object Tracking are far more <u>useful</u> to companies, and often directly follow Object Detection</strong>.</blockquote><figure class="kg-card kg-image-card kg-card-hascaption"><a href="https://courses.thinkautonomous.ai/obstacle-tracking"><img src="https://coachtestprep.s3.amazonaws.com/direct-uploads/user-30623/e3505045-564e-4234-ac8d-0aee93502c88/Screenshot%202022-04-04%20at%2015.56.56.png" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1994" height="1070"></a><figcaption><span style="white-space: pre-wrap;">Object Detection vs Object Tracking (source)</span></figcaption></figure><p>To put things very clearly:</p><ul><li><strong>The output of an object detector</strong> is a list of 2D or 3D Bounding Box coordinates</li><li><strong>The output of an object tracking algorithm</strong> is the bounding box PLUS the ID of each, and yes, we can add the speed, class, action, age in the frame, whether it&apos;s occluded or not, and so on...</li></ul><p>But there are many types of object tracking algorithms, and many different algorithms to consider, so let&apos;s immediately jump to the second point:</p><h2 id="the-main-elements-of-object-tracking-how-to-track-objects-from-frame-to-frame">The Main Elements of Object Tracking: How to track objects from frame to frame</h2><p>The initial goal of object tracking is, taking two frames, to associate their bounding boxes. To explain the idea very well, let me take the example of this scene:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/seq_17-ezgif.com-optimize.gif" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="560" height="375"><figcaption><span style="white-space: pre-wrap;">The scene where we&apos;ll run tracking on</span></figcaption></figure><p><strong>Say this is the view from our self-driving car, and we want to do vehicle tracking.</strong> First, let&apos;s remove the background and focus only on the moving objects:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/decomposition-ezgif.com-optimize.gif" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="568" height="429"><figcaption><span style="white-space: pre-wrap;">The same scene with moving objects only</span></figcaption></figure><p><strong>Ohhh this is so cool</strong>. What I just did is called <strong>Gaussian Decomposition</strong> and has been achieved solely to impress you with 3D Gaussian Splatting, but whatever, let&apos;s imagine we&apos;re the self-driving car stopped and looking at these 3 objects.</p><p><strong>How would we assign the IDs?</strong> First, we would need to extract video frames from this scene, so let&apos;s put our video object tracking lenses and look at it this way:</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.35.46.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1830" height="344" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-22.35.46.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-22.35.46.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-20-at-22.35.46.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.35.46.jpg 1830w" sizes="(min-width: 1200px) 1200px"><figcaption><span style="white-space: pre-wrap;">When working on tracking applications, a video is often considered</span></figcaption></figure><p><strong>Now, let&apos;s try to do the tracking for Frame 3 and Frame 4.</strong></p><p>There are 3 main steps you you need to understand:</p><ol><li><strong>Object Detection</strong>: YOLO &#x2014; because you only live once</li><li><strong>Data Association</strong>: Bipartite Matching, the Hungarian Algorithm</li><li><strong>Object Tracking</strong>: Kalman Filters</li></ol><h3 id="1-object-detection">1) Object Detection</h3><p>The first step is simple, we detect objects &#x2014;&#xA0;this is where everybody rush and stops. Since we said we&apos;d focus on frame 3 and 4 only, this would be the output of YOLO:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.15.16.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1360" height="510" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-23.15.16.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-23.15.16.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.15.16.jpg 1360w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">YOLO is cool, but kinda useless when you think about it</span></figcaption></figure><p>The problem here is, if every car is red, there is no tracking, <u>we need IDs</u>, and we need good tracking. Comes step 2.</p><h3 id="2-data-association-bipartite-matching-the-hungarian-algorithm">2) Data Association: Bipartite Matching &amp; The Hungarian Algorithm</h3><p>The second step is to attribute IDs to every box, and this for the entire scene. In this article, I&apos;m going to consider that we&apos;re tracking multiple objects, and thus doing multi object tracking (MOT). If we keep our same example, it should look like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.18.09.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1366" height="514" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-23.18.09.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-23.18.09.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.18.09.jpg 1366w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">What we want: IDs following eachother</span></figcaption></figure><p>And to be clear, it should <u>NOT</u> look like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.19.20.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1364" height="508" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-23.19.20.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-23.19.20.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.19.20.jpg 1364w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">What we don&apos;t want: ID Switch</span></figcaption></figure><p><strong>So how do we do that? </strong>You could <u>assign</u> an ID to every single object in frame 3, but these objects have to correspond.</p><p>So here is how we are going to make a good matching algorithm:</p><ol><li>We attribute an ID to each box for the first frame (t-1).</li><li>We match the box from (t-1) to (t)</li><li>We assign the ID to each box for the second frame (t)</li></ol><p>So:</p><ol><li>We attribute an ID to each box for the first frame (t-1).</li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.38.54.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="708" height="538" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-22.38.54.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.38.54.jpg 708w"><figcaption><span style="white-space: pre-wrap;">Attributing an ID to each object can be done totally randomly</span></figcaption></figure><p>Then:</p><ol start="2"><li>We match the box from (t-1) to (t)</li></ol><p>But how?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.40.50.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1372" height="520" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-22.40.50.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-22.40.50.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.40.50.jpg 1372w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The next step is about giving a &quot;color&quot; to each of the boxes</span></figcaption></figure><p><strong>This is where we run the data association. We are going to, <u>for each box</u> (and this is important), try to find the closest match with the other box. For this, we&apos;ll consider matching criteria, such as:</strong></p><ul><li>The euclidean distance of the center of the boxes</li><li>The size &amp; shape change of the box</li><li>The class (don&apos;t match a car with a cyclist)</li><li>The IOU (Intersection Over Union) of how these 2 boxes overlap</li><li>The deep association metric each box has with the other</li><li>And more... (in my <a href="https://courses.thinkautonomous.ai/obstacle-tracking" rel="noopener noreferrer"><strong>obstacle tracking course</strong></a>, we implement object tracking using a cost function of tons of these costs and more...)</li></ul><p><strong>Let me show you an example for the first car, if we pick the <u>euclidean distance</u> as a metric:</strong></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.45.40.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1418" height="544" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-22.45.40.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-22.45.40.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.45.40.jpg 1418w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The euclidean distance is not the best 2D matching solution (more relevant in 3D), but it can pass for this example</span></figcaption></figure><p><strong>As you can see, if we compute the euclidean distance between the red dot and the left car,</strong> <strong>we have a very small distance</strong>. On the other hand, we have a big distance with the right car. And this is how we can do the matching. You are then going to hold what we call a <strong>Bipartite</strong> <strong>Graph</strong> storing all the distances.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.54.04.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1144" height="520" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-22.54.04.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-22.54.04.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-22.54.04.jpg 1144w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A bipartite graph stores all the distances and assigns the lowest to each free element</span></figcaption></figure><p>Finally:</p><ol start="3"><li>We assign the ID to each box for the second frame frame (t)</li></ol><p>The lowest distance is picked and every object is assigned.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.01.17.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1110" height="426" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-23.01.17.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-20-at-23.01.17.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.01.17.jpg 1110w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The matching in action</span></figcaption></figure><p><strong>And this is how we do it!</strong></p><p>Now, I showed you a very simple version, and before I get loaded with &quot;But what if...?&quot; &#x2014; there is one final idea to talk about:</p><h3 id="3-the-kalman-filter">3) The Kalman Filter</h3><p>When combined with a Kalman Filter, this object tracking method is called <strong>SORT</strong> (Simple Online Realtime Tracking). What does a Kalman Filter do? Over time, you&apos;re going to have a real motion of objects, and thus, the Kalman Filter is going to predict the next position of the box &#x2014;&#xA0;so that the distance gets smaller.</p><p>If I were to illustrate it, it would be:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.05.36.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="688" height="522" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-20-at-23.05.36.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-20-at-23.05.36.jpg 688w"><figcaption><span style="white-space: pre-wrap;">A Kalman Filter helps with objects moving too fast, or videos with low frame rate</span></figcaption></figure><p><strong>The orange box is the predicted box, and the red box is the (t-1) box.</strong> When using a Kalman Filter, we are going to use the orange box for matching instead. So rather than matching the box at time (t-1) with the box at time (t), we match it with its forward prediction at time (t). This has a big impact, because if a car suddenly accelerates, or is about to leave the frame, you can predict it and thus improve your tracking results.</p><h4 id="how-a-kalman-filter-works">How a Kalman Filter works</h4><p>To clarify a bit, we&apos;ll initialize our Kalman Filter with a motion model, say constant velocity, and predict the next position of the box. We may predict that the car is at the same position as it is now. Then, we&apos;ll get the &quot;real&quot; data from YOLO at t+1, and thus we know we were a bit &quot;off&quot; &#x2014;&#xA0;so we calculate that motion, and refine our next prediction... <u>again and again until we&apos;re perfectly able to predict the next position</u>.</p><p>I have a complete article doing a Live Example of Multi Object Tracking with the actual KF values <a href="https://www.thinkautonomous.ai/blog/computer-vision-for-tracking/" rel="noopener noreferrer"><strong>here</strong></a><strong>.</strong></p><p>Awesome, so this is our second point, and before taking questions, I&apos;ll just show you some advanced algorithms in our third point...</p><h3 id="questions-about-the-simple-tracking-methods">Questions about the simple tracking methods</h3><p>A few questions you may have about this object tracking approach:</p><h4 id="is-the-euclidean-distance-standard-what-do-people-use-in-the-field"><strong>&quot;Is the euclidean distance standard? What do people use in the field?&quot;</strong></h4><p>No, the euclidean distance is actually one of the worst metric you can use. Why? Because imaging we have many objects? Or if an object comes in front of another? Suddenly, the shortest distance could be attributed to another object, and thus, we could fail the tracking. People like to use the IOU (Intersection Over Union) instead, or the IOU + the distance between the deep convolutional features (we look at the CNN feature map inside the bounding box). <strong><u>Often, a combined cost is the best solution.</u></strong></p><h4 id="what-if-2-object-have-a-minimal-distance-to-the-same-object">What if 2 object have a minimal distance to the same object?</h4><p>Haha! That is interesting. Imagine this case:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-2.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1640" height="706" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-2.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-2.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-2.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-2.jpg 1640w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">In this scene, an object is leaving the scene, and another one is entering</span></figcaption></figure><p>As you can see, in this example, object 1 is moving, and a third object is entering the scene. It&apos;s possible that we don&apos;t know exactly how to do the match; maybe the first object moved too much, and thus we have a situation like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image-3.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1866" height="1132" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image-3.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/image-3.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/image-3.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image-3.jpg 1866w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A bipartite graph where the same object is matched with 2 elements</span></figcaption></figure><p>So what if everybody has to be assigned to the middle object? This can happen extremely often, especially when the two frames don&apos;t have the same number of vehicles. This question, and ALL the others like &quot;What if we have occlusions?&quot; or &quot;What if 2 objects are in frame 3 but 5 objects are in frame 4&quot; or &quot;What if an object is mis-detected?&quot;... can be answered with one word: <strong>The Hungarian Algorithm.</strong></p><p>This is the algorithm behind all matching, even the advanced ones with Deep Learning, and if you&apos;d like to understand how it works, I have a complete article about it <a href="https://www.thinkautonomous.ai/blog/hungarian-algorithm/" rel="noopener noreferrer"><strong>here</strong></a><strong>.</strong></p><p>You now have a good idea of how to do multiple object tracking, let&apos;s move to the advanced algorithms.</p><h2 id="advanced-multi-object-tracking-models">Advanced Multi Object Tracking Models</h2><p>In this section, I will still focus on the Multiple Object Tracking problem. Not the single object tracking (image tracking), which is often a simpler Computer Vision problem answered with other techniques.</p><p>In the example I showed you, we were (1) detecting cars and (2) tracking them. We call this tracking-by-detection. In reality, we don&apos;t <strong><em>have</em></strong> to do this. Modern approaches today track object immediately: we call this joint detection and tracking. And thus, we have two families:</p><ul><li>Tracking by Detection algorithms</li><li>Joint Detection and Tracking algorithms</li></ul><p>Let&apos;s briefly take a look at each:</p><h3 id="tracking-by-detection">Tracking By Detection</h3><p>You already saw a simple object tracking approach called SORT, which belongs to the first family; and when you add deep CNN metrics, you have Deep SORT! The idea is, rather than simply looking at the euclidean distances, we also add the Deep CNN feature distances as a metric. For example:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/321b5d8f-6119-4870-b4f8-5dbe973a1ca6-1.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="2000" height="1153" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/321b5d8f-6119-4870-b4f8-5dbe973a1ca6-1.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/321b5d8f-6119-4870-b4f8-5dbe973a1ca6-1.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/321b5d8f-6119-4870-b4f8-5dbe973a1ca6-1.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/321b5d8f-6119-4870-b4f8-5dbe973a1ca6-1.jpg 2000w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">DeepSORT is about implementing feature distances as a cost</span></figcaption></figure><p>The matching is here done using the IOU/Distance/Shapes, etc... but also adding this new cost, which is going to look &quot;inside&quot; the bounding box.</p><p><strong>What are other approaches from this family?</strong></p><p>For example, <a href="https://arxiv.org/abs/2110.06864" rel="noopener noreferrer"><strong>ByteTrack</strong></a> is pretty well-known:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/teasing--1-.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1546" height="1566" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/teasing--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/teasing--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/teasing--1-.jpg 1546w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">ByteTrack principle</span></figcaption></figure><p><strong>The goal of this algorithm is to match all the detection boxes</strong>, even the low confidence ones, which are assigned to unmatched detections (when it makes sense). ByteTrack&apos;s &quot;two stage association&quot; approach (of associating both high-confidence and low-confidence detections) makes it highly beneficial in situations where confidence is inconsistent or where external factors make detection challenging.</p><p>Now let&apos;s see the second family:</p><h3 id="joint-detection-tracking">Joint Detection &amp; Tracking</h3><p>Here are a few state of the art algorithms to consider...</p><h4 id="fair-mot">FAIR-MOT</h4><p><a href="https://arxiv.org/abs/2004.01888" rel="noopener noreferrer"><strong>FairMOT</strong></a><strong> is a multi-object tracking (MOT) framework that is focused on the re-identification problem. </strong>The training loss is optimized for both object detection and re-identification (ReID). It has two branches: one for anchor-free detection and another for ReID feature embedding. This is really effective when you have occlusions, dense crowds, and offers real-time performance without compromising tracking accuracy.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-21-at-13.52.43.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1992" height="810" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-21-at-13.52.43.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-21-at-13.52.43.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-21-at-13.52.43.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-21-at-13.52.43.jpg 1992w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">FAIR MOT is built on Re-Identification</span></figcaption></figure><p><strong>Now is this better then the other one? Which could be better?</strong> As you saw, some algorithms are optimized for occlusions, others for using multiple sensors (or from different camera angles), others for sudden velocity changes... etc...</p><p>Now let&apos;s see some examples:</p><h2 id="example-1-waymos-stateful-track-transformers">Example 1: Waymo&apos;s Stateful Track Transformers</h2><p>In 2024, Waymo released an object tracking algorithm called <a href="https://waymo.com/research/stt-stateful-tracking-with-transformers-for-autonomous-driving/" rel="noopener noreferrer"><strong>STT: Stateful Track Transformer</strong></a>. Here is the diagram:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-21-at-13.59.30.jpg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1808" height="644" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-21-at-13.59.30.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-21-at-13.59.30.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-21-at-13.59.30.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-21-at-13.59.30.jpg 1808w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Waymo&apos;s STT</span></figcaption></figure><p>As you can see, this algorithm is based on <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">LiDAR</a> (as for almost everything at Waymo), and it begins with the Detection Encoder thad encode all of the 3D detections and extract temporal features for each track. The temporal features are fed into the Track-Detection Interaction module to aggregate information from surrounding detections and produce association scores and predicted states for each track. The Track State Decoder also takes the temporal features to produce track states in the previous frame t &#x2212; 1.</p><p>I have a complete and full explanation of this advanced algorithm<strong> in my free app</strong>.</p><p>Next, let&apos;s see another example:</p><h2 id="example-2-4d-perception-with-3d-kalman-filters">Example 2: 4D Perception with 3D Kalman Filters</h2><p>In my article on <a href="https://www.thinkautonomous.ai/blog/3d-object-tracking/" rel="noopener noreferrer"><strong>3D Object Tracking</strong></a>, I talk about object tracking systems that work with 3D Bounding Boxes. In this case, there are 2 different situations:</p><ul><li><strong>We&apos;re still using monocular camera,</strong> and this time tracking 3D Bounding Boxes</li><li><strong>We&apos;re using LiDARs, RADARs, or stereo cameras</strong>, and thus track 3D Bounding Boxes, but have access to a different type of data (<a href="https://www.thinkautonomous.ai/blog/point-cloud-registration/" rel="noopener noreferrer">point clouds</a>, RADAR maps, ...)</li></ul><p>In my course MASTER <a href="https://courses.thinkautonomous.ai/tracking-pack-iii" rel="noopener noreferrer">4D PERCEPTION</a>, I do this with LiDARs, and here is the output:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/4DOutput-ezgif.com-optimize.gif" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="480" height="290"><figcaption><span style="white-space: pre-wrap;">a 4D Perception Project (from my 4D Perception course)</span></figcaption></figure><p>Now that is stunning!</p><p>The key elements of this project are:</p><ul><li><strong>We&apos;re using a </strong><a href="https://www.thinkautonomous.ai/blog/how-lidar-detection-works/" rel="noopener noreferrer"><strong>LiDAR detector</strong></a><strong> like Point-RCNN</strong>, PV-RCNN, or Cas-A to find 3D Bounding box coordinates</li><li><strong>We then have several possible association criteria</strong> such as the 3D IOU, but also the point cloud shapes, the 3D euclidean distance, and more...</li><li><strong>We&apos;re using a second order 3D Kalman Filter</strong> tracking XYZ in a constant velocity motion model.</li></ul><p>You can learn more about this in my course <a href="https://courses.thinkautonomous.ai/4d-perception" rel="noopener noreferrer"><strong>MASTER 4D PERCEPTION</strong></a> (but warning, it&apos;s advanced and I highly recommend you check my introduction course on tracking before going to this one).</p><p>You&apos;ve now been through the end of this article!</p><p>Congratulations! Let&apos;s do a quick summary of everything you learned:</p><h2 id="summary">Summary</h2><p>First, there is this image that I built, showing the different stages of tracking:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/tracking.001.jpeg" class="kg-image" alt="A complete overview of Object Tracking Algorithms in Computer Vision &amp; Self-Driving Cars" loading="lazy" width="1920" height="1080" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/tracking.001.jpeg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/tracking.001.jpeg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/tracking.001.jpeg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/tracking.001.jpeg 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The summary of what you can do after building detection skills</span></figcaption></figure><p>As you can see, most engineers who only learn to use object detectors miss the benefits of association &amp; tracking: ID, Age, Uncertainty, Motion, and even (if you predict) future positions, actions, trajectory, etc...</p><p>In a self-driving car, using tracking instead of simple detection is much more useful, and there are many real time object tracking applications that creating fantastic startups.</p><p>A few other summaries:</p><ul><li><strong>The main elements of object tracking are (1) detection, (2) association, and (3) prediction</strong> via Kalman Filter.</li><li><strong>In association, we build a bipartite graph holding the objects of (t-1) and those of (t)</strong> and do a matching based on criteria such as euclidean distance, IOU, deep features, etc... While the majority of CV enthusiast use IOU, I highly recommend a combined cost function instead.</li><li><strong>The Hungarian algorithm is the algorithm responsible for tracking,</strong> and handles all the edge cases such as new objects, old objects, mis-detections, occlusions, etc...</li><li><strong>Kalman Filters are responsible for the prediction allowing for smoother tracking</strong>. In the case of 3D Tracking, we would use 3D Kalman Filters.</li><li><strong>There are two families of trackers</strong>: tracking-by-detection, and joint-detection-and-tracking. The second family promises to do it all in a single network, and can be seen as more advanced; yet, today, both are heavily used (and the first category probably more).</li><li><strong>Many companies from the field use tracking</strong>, such as Waymo with their Stateful Track Transformers &#x2014; and you can learn it all on my free app.</li></ul><h2 id="next-steps">Next Steps</h2><p>If you enjoyed this article on tracking, there are chances you might enjoy these other ones:</p><ul><li><a href="https://www.thinkautonomous.ai/blog/computer-vision-for-tracking/" rel="noreferrer"><strong>Computer Vision for Multi-Object Tracking: Live Example</strong></a></li><li><a href="https://www.thinkautonomous.ai/blog/3d-object-tracking/" rel="noreferrer"><strong>An Introduction to 3D Object Tracking (Advanced)</strong></a></li><li><a href="https://www.thinkautonomous.ai/blog/hungarian-algorithm/" rel="noreferrer"><strong>Exactly how the Hungarian Algorithm works</strong></a></li></ul><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4E5;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Signup for my daily emails</strong></b>, and get access to my App &quot;Think Autonomous&quot;, containing my video breaking down Waymo&apos;s Stateful Track Transformers, and teaching tons of other advanced algorithms.<br><a href="https://www.thinkautonomous.ai/private-emails?ref=thinkautonomous.ai">Subscribe here and join 10,000+ Engineers!</a></div></div><p>You can also see relevant tracking courses <a href="https://www.thinkautonomous.ai/tracking-journey/" rel="noreferrer">here</a>.</p>]]></content:encoded></item><item><title><![CDATA[The main types of sensors in Robotics & Self-Driving Cars (and how much you should know about each)]]></title><description><![CDATA[We often discuss AI & Robotics, but what about the sensors that collect all the data? In this article, we'll see an overview of all the types of sensors used in this field, and how much you should learn about each.]]></description><link>https://www.thinkautonomous.ai/blog/types-of-sensors/</link><guid isPermaLink="false">677ff7095b2944097abed9a4</guid><category><![CDATA[self-driving cars]]></category><category><![CDATA[robotics]]></category><dc:creator><![CDATA[Jeremy Cohen]]></dc:creator><pubDate>Mon, 13 Jan 2025 16:12:47 GMT</pubDate><media:content url="https://www.thinkautonomous.ai/blog/content/images/2025/01/types-of-sensors-1.webp" medium="image"/><content:encoded><![CDATA[<img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/types-of-sensors-1.webp" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)"><p><strong>&quot;I made it&quot;. </strong>These were my first thoughts back in 2017, after I graduated from my diploma in Internet of Things and got my first internship. After years learning about electrical signals, bluetooth, connectivity, networks, prototyping, and time spent building projects using multiple types of sensors... I was going to be an IoT Engineer &#x2014;&#xA0;and make millions!</p><p><strong>Or so I thought... Because my first project ended up being an epic failure</strong>. The market wasn&apos;t really interested in smart alarms like we all thought it would &#x2014;&#xA0;there was no interest in smart fridges or other smart objects either... IoT was flopping &#x2014;&#xA0;and thus, I got sent to another consulting project on AI!</p><p>And AI gave me this pair of glasses and told me &quot;See? You learned about all these sensors, but what really matters is the data it collects&quot;. That was really an eye-opener for me; to the point where I spent most of my energy learning about AI &amp; Robots... and writing on it.</p><p><strong>But sensors matter too </strong>&#x2014;&#xA0;and without them &#x2014; there wouldn&apos;t be any self-driving car in development. This is why I&apos;d like this article to focus on sensors. Not on the AI processing, but really the sensors themselves&#xA0;&#x2014; explain to you the main types, and how much you should know about them.</p><p>This article will split sensors in 2 categories:</p><ol><li><strong>Exteroceptive</strong> sensors &#x2014; the external sensors</li><li><strong>Proprioceptive</strong> sensors &#x2014; the internal sensors</li></ol><p>Yes, the wording is scary &#x2014; but you&apos;ll see they don&apos;t bite. Let&apos;s begin.</p><h2 id="exteroceptive-sensors-looking-at-the-outside-world"><strong>Exteroceptive Sensors: Looking at the outside world</strong></h2><p><strong>In robotics or self-driving cars, you need to see the world so you can navigate in it</strong>. Every sensor that is <strong>external</strong> will be part of the exteroceptive sensor category. For example cameras or stereo cameras, LiDARs (light detection and ranging), RADARs (radio detection and ranging), GPS, ultrasonic sensors &amp; proximity sensors, thermal cameras, infrared sensors, and so on...</p><p>There are many ways we could categorize this first family, but I like to do it by task, for example:</p><ul><li>Perception Sensors</li><li>Localization Sensors</li><li>Environmental Condition Sensors</li></ul><h3 id="perception-sensors">Perception Sensors</h3><p>I assume in this article you have a good idea of what are the Perception sensors. Yet you may not realize how much you need to know about each. Let&apos;s see for a few per category, what you should know:</p><h4 id="cameras">Cameras</h4><p><strong>So you know what a camera is, right? </strong>Well, do you know how to set parameters like ISO? Shutter? Gain? Whether to use grayscale or RGB? Yes? So, do you know how to find the intrinsic and extrinsic parameters of the camera? Or how to use Charuco calibration? What about Stereo Calibration? Do you know how to calibrate using multiple checkerboards?</p><p><strong>Knowing cameras isn&apos;t just knowing how to load images, but it&apos;s mostly knowing very well the camera parameters</strong>. When you truly understand cameras, you can do really wonderful applications, like this <a href="https://www.thinkautonomous.ai/blog/3d-computer-vision/" rel="noopener noreferrer">3D Reconstruction</a> we do in my course<strong> </strong><a href="https://courses.thinkautonomous.ai/stereo-vision" rel="noopener noreferrer"><strong>MASTER STEREO VISION</strong></a>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/crestereo-point.gif" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="960" height="532" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/crestereo-point.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/crestereo-point.gif 960w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Example of 3D Reconstruction done by 2 cameras (</span><a href="https://courses.thinkautonomous.ai/stereo-vision" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>As you can see, we build a complete 3D environment from just cameras! Cameras may give flat 2D images, but when used in stereo mode, you can leverage their 3D properties.</p><p><strong>What&apos;s interesting:</strong></p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"> In most self-driving car courses out there, you see an object detection algorithm applied on just one front camera. Yet, I would bet 90% of self-driving car startups NEVER use just one front camera. It&apos;s always minimum 3 front cameras, some surround cameras for a 360&#xB0; view, and thus, you not only need to know how to use one camera, but also 6 or 8 cameras.</div></div><h4 id="lidars">LiDARs</h4><p><strong>Cameras are good with distances, but not as good as LiDARs</strong>. LiDARs/lasers are light sensors that can send a light wave and measure the exact distance of an object. It means they work in the dark, and they can construct a very accurate <a href="https://www.thinkautonomous.ai/blog/point-cloud-registration/" rel="noopener noreferrer">point cloud</a>. The thing is, there are many <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noopener noreferrer">types of LiDARs</a> (like 2D LiDARs, 3D, or even <a href="https://www.thinkautonomous.ai/blog/fmcw-lidars-vs-imaging-radars/" rel="noopener noreferrer">4D LiDARs</a> &#x2014;&#xA0;but also Time of Flight vs others...), and the output may be intuitive, but the inner technology is quite complex.</p><p>Below is an example of LiDAR processing we do in my course <strong>POINT CLOUDS CONQUEROR</strong>:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/ransac3d-ezgif.com-optimize.gif" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="309" height="203"><figcaption><span style="white-space: pre-wrap;">3D Point Cloud Segmentation (</span><a href="https://courses.thinkautonomous.ai/point-clouds" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>LiDARs are excellent sensors, but extremely <u>costly</u></strong> (this is why startups like Tesla don&apos;t use them), easily <strong><u>affected by weather</u></strong> like fog, rain, snow, or even dust, and they <strong><u>can&apos;t measure velocities</u></strong> (when you need to measure the velocity of an object, you have to compute the distance between two frames). This is why most automotive companies use... RADARs!</p><h4 id="3d-and-4d-radars-radio-detection-and-ranging">3D and 4D RADARs (Radio Detection And Ranging)</h4><p><strong>RADARs are very mature (100 years old or so)</strong>, used in TONS of industries, but they&apos;re super un-intuitive. The first time I saw a RADAR output, it just took 30 minutes to get what was on the screen, and I wasn&apos;t even sure. Most <a href="https://www.thinkautonomous.ai/blog/perception-engineer/" rel="noopener noreferrer">Perception Engineers</a> in the AI/Self-Driving Car space avoid them like a disease because of that.</p><p><strong>Because they use radio waves, they aren&apos;t affected by weather conditions</strong>, and because they use the Doppler Effect, they can measure the relative velocity of objects. In the self-driving car world, there are almost 0 courses on RADARs, but I did create a very interesting piece of content for my membership, in which we learn to visualize RADARs, create point clouds, and intuitively understand the output. Here is a look at it:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/range_azimuth_opencv-ezgif.com-optimize.gif" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="640" height="256" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/range_azimuth_opencv-ezgif.com-optimize.gif 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/range_azimuth_opencv-ezgif.com-optimize.gif 640w"><figcaption><span style="white-space: pre-wrap;">Imaging RADAR output</span></figcaption></figure><p><strong>This output has been done using an Imaging RADAR (4D RADAR) from a startup named </strong><a href="https://www.bitsensing.com" rel="noopener noreferrer"><strong>bitsensing</strong></a>. I highly recommend you go check out their product, especially because it&apos;s 4D technology. Unlike most RADARs that find 2D + Speed, Imaging RADARs find 3D + Speed (they see the Z dimension). Similarly <a href="https://www.thinkautonomous.ai/blog/fmcw-lidar/" rel="noopener noreferrer">FMCW LiDARs</a> steal the Doppler effect from RADARs to measure speed directly, and become 4D LiDARs as well.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interesting Fact</strong></b>: RADAR Engineers have extremely low competition, and Imaging RADARs Engineers demand is raising from companies. As an engineer, building RADAR skills can mean facing weak competition and high demand.</div></div><h4 id="other-sensors-infrared-ultrasonic-thermal">Other Sensors (Infrared, Ultrasonic, Thermal ...)</h4><p>Let&#x2019;s continue looking at the types of sensors in the Perception world. When you park your car and hear &#x201C;BIP BIP&#x201D;, you work with an ultrasonic sensor. These are great sensors for short range, static objects. On the other hand, RADARs are longer range, and work better with moving objects.</p><p>Infrared or Thermal can be a great thing to use with cameras. For example, <a href="https://www.luxonis.com" rel="noopener noreferrer">Luxonis</a> recently announced it was now selling thermal cameras and stereo cameras with Active Stereo. This means using IR sensors with dot projectors to help the camera find depth; which can be very useful for example when an object is in front of a wall (and thus, no detail is visible).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/image.jpg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="688" height="430" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/image.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/image.jpg 688w"><figcaption><span style="white-space: pre-wrap;">Active vs Passive Stereo (</span><a href="https://docs-old.luxonis.com/en/latest/pages/depth/#:~:text=Stereo%20depth%20depends%20on%20feature,both%20texture%20and%20lighting%20requirements." rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p>Now, let&#x2019;s move on to category 2:</p><h3 id="localization-sensors">Localization Sensors</h3><p><strong>Even more misunderstood, GPS rely on satellites in space to triangulate your position</strong>. They&apos;re extremely used in self-driving cars, especially when relying on classic localization and mapping techniques. Other than GPS, RTK (Real-Time Kinematic) GPS are now the standard, because while GPS are accurate at ~1 meter, RTK GPS give centimeter level accuracy?</p><p>How?</p><p><strong>RTK GPS communicate with a fixed antenna</strong> <strong>(with a known position) which can measure the error in localization</strong>. If the antenna is positioned at position (0,0,0), uses a GPS that says it&apos;s at (0.5, 0.5, 0.5), you know you have a 0.5 meter error; and you can send this info back to the cars using RTK GPS.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/rtk-ezgif.com-optimize.gif" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="600" height="338" srcset="https://www.thinkautonomous.ai/blog/content/images/2025/01/rtk-ezgif.com-optimize.gif 600w"><figcaption><span style="white-space: pre-wrap;">RTK GPS Principle</span></figcaption></figure><p><strong>In my own personal experience</strong>, I worked on autonomous shuttles that drove through the Polytechnique Campus in France (the country&apos;s most prestigious engineering school), and I remember TONS of problems with GPS, such as:</p><ul><li>Clouds and weather affecting the signal strength</li><li>Tunnels turning our GPS receivers off</li><li>Trees confusing GPS positions</li><li>Or students weird experimentations in some dorm rooms confusing our GPS signals</li></ul><p><strong>For some of these cases, we relied on Ultrawide band technology</strong>, which consisted in setting up a network of fixed reference nodes throughout the environment. These nodes communicated with mobile tags on the robots, providing precise distance measurements through time-of-flight calculations.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">What&apos;s interesting</strong></b>: Most engineers don&apos;t really work with GPS, but when they do, they get surprised by the amount of information received. Just printing the data, you can tell if a GPS is European or Russian, how many satellites are found, or even how much you can trust the numbers you are seeing. In my experience, weather highly affects GPS, and you may end up relying on vision based solutions.</div></div><p>Finally:</p><h3 id="environmental-contact-sensors">Environmental &amp; Contact Sensors</h3><p><strong>In late 2024, Waymo announced their 6th gen vehicle was using an array of </strong><a href="https://waymo.com/blog/2024/08/meet-the-6th-generation-waymo-driver" rel="noopener noreferrer"><strong>audio sensors</strong></a><strong> to recognize honks, ambient noise, or sirens</strong>. When you think about it, it makes a lot of sense for a car to use sound. We rely on sound a lot when driving. But even bigger is outside of the car world. Using this logic, we could think of humidity sensors, gas sensors, radioactivity sensors, ...</p><p>Now something related:</p><p><strong>If we change the focus from &quot;cars&quot; to things like wheeled robots</strong>, humanoid robots, or specialized robots, like surgery robots. They often make something cars don&apos;t: <u>physical contact</u>. And thus part of their environment is the other objects they make contact with.</p><p><strong>For example, capacitive sensors are the types of sensors that detect tactile feedback</strong>, proximity sensing, or materials. You can see them as a &quot;skin&quot; for robots. Some can also sense humidity or a surface, or find moisture, and more...</p><p>An example on a robot arm equipped with a torque sensor and a grip sensor:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://reachrobotics.com/media/Diagram-03.jpg.webp" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="1837" height="1217"><figcaption><span style="white-space: pre-wrap;">Robotic Arms make contact, and thus are equipped with &quot;contact&quot; sensors</span></figcaption></figure><p>Alright! We have covered the first category of sensors. If we take a look back, we have something like this:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.31.08.jpg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="2000" height="1114" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-13-at-15.31.08.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-13-at-15.31.08.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-13-at-15.31.08.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.31.08.jpg 2054w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The first half of the mindmap: exteroceptive sensors</span></figcaption></figure><p>There are a few ones we didn&apos;t cover, like flow sensors in water, or light sensors, photodiodes, etc... but they aren&apos;t that used with robots.</p><p>What now? Let&apos;s look into the second category:</p><h2 id="proprioceptive-sensors">Proprioceptive Sensors</h2><p><strong>I was able to show off with my cool GIFs in the first part, wasn&apos;t I?</strong> Well, who&apos;s laughing now? Because I have to write about <u>proprioceptive</u> sensors, and truth is, I never worked that much with these types of sensors before.</p><p>Why?</p><p><strong>Because a proprioceptive sensor is an <u>internal</u> sensor</strong>. It&apos;s something that measures the INSIDE of your robot or vehicle. For example, an odometer measures the rotations of your wheel to estimate how much you moved. An IMU measures your position, orientation, and measure how YOU are moving. While exteroceptive sensors focus on the others, proprioceptive sensors focus on <u>you</u> (I mean, your robot).</p><p>I would list down 3:</p><ul><li>Position sensors</li><li>Motion sensors</li><li>Automotive sensors</li></ul><h3 id="position-sensors">Position sensors</h3><p><strong>Imaging you wake up in the middle of the night, heading for the toilets (oh this is going to be lame).</strong> You can&apos;t see, but know you should walk 10 steps to get to the throne room. So you walk one step, two steps, three... until you reach 10 steps, and... no toilets? &#x1F6BD; You agitate your arm, but nothing seems in the way. Where is the door? So you walk one more step, then two, then thr&#x2014; <strong>BAM</strong>! Found the door. What happened here? You probably estimated your number of steps a bit inaccurately. Have you been drinking last night?</p><p><strong>A wheel encoder is a bit similar,</strong> it measures how many rotations your wheels do, and thus can tell you how many meters your car or robot has travelled. It&apos;s extremely useful in localization, SLAM, or similar tasks.</p><p>If you look at an Apple Maps vehicle, you will see the wheel encoders plugged at the bottom:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://ci3.googleusercontent.com/meips/ADKq_NaE47mUrFpVxvLCDwkMv0RqEw2q9SiTlnsEhD2vnXCAc_9moI-jcNzM4ZGr28Tc4R8Pzj6PyijRAlwzJknGSxb-tXmGyb8zhyFllBrLYQMzoJg7b3qYgAmUUjf7lZQjXQmxWySzD82hgHaoboemTN8BRF5iFBkRxYVXwrqx3qsryF9_uxasv4GU=s0-d-e1-ft#https://www.dripuploads.com/uploads/image_upload/image/3382756/embeddable_93eb2940-0354-4ea8-9ecd-1290fb8efb40.jpeg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="1800" height="1013"><figcaption><span style="white-space: pre-wrap;">Apple Map Vehicles are equipped with wheel encoders for accurate positionning</span></figcaption></figure><h3 id="motion-sensors">Motion Sensors</h3><p><strong>There&apos;s position, but there&apos;s also motion.</strong> I told you I wasn&apos;t super at ease with proprioceptive sensors, but I did work on odometry, <a href="https://www.thinkautonomous.ai/blog/visual-inertial-odometry/" rel="noopener noreferrer"><strong>Visual Inertial Odometry</strong></a>, LiDAR-Inertial SLAM, Localization, and all of these words. What do they have in common? They use sensors that calculate their motion. See in my course <a href="https://courses.thinkautonomous.ai/slam" rel="noopener noreferrer"><strong>SLAM UNLEASHED</strong></a> how I present the IMU outputs:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-10-at-16.16.23.jpg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="1532" height="866" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-10-at-16.16.23.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-10-at-16.16.23.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-10-at-16.16.23.jpg 1532w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Example of an IMU output (</span><a href="https://courses.thinkautonomous.ai/ros2" rel="noreferrer"><span style="white-space: pre-wrap;">source</span></a><span style="white-space: pre-wrap;">)</span></figcaption></figure><p><strong>An IMU is an Inertial Measurement Unit that outputs orientation</strong> (in quaternion coordinates), an <strong>angular</strong> velocity, and <strong>linear</strong> <strong>acceleration</strong>. It means it&apos;s also part of the accelerometer sensors, that tell you how much you moved, how much you are rotated to the left, etc... They also tell the temperature and even measure the magnetic field to account for the Earth&apos;s rotation, and thus know exactly how you are moving.</p><p><strong>Yes, if you used an IMU to record data in 2011,</strong> it&apos;s likely different than if you&apos;re doing it today; because the earth&apos;s magnetic field moved, and thus, you don&apos;t travel the same distance. We&apos;re getting really precise here, and I do cover this in my robotic architect course already; but I wanted to point it out because it&apos;s an important part.</p><p>The other types of motion sensors can be accelerometers like you have in your phones to calculate that you make your 10,000 daily steps, or even gyroscopes that measure your velocity.</p><p>Finally:</p><h3 id="regular-automotive-sensors">Regular Automotive Sensors</h3><p>I really didn&#x2019;t get inspired for this one, but when you think about it; a self-driving car must use all the regular car sensors as well.</p><p>For instance, a <strong>pressure sensor</strong> in the tires monitors the air pressure and converts it into an <strong>electrical signal</strong>, ensuring optimal performance and safety while driving. I remember one day, I was driving on the highway, and I noticed that I had to apply some steering to the left to keep my car straight. Somehow, it drifted to the side&#x2026; After a quick check, I realized I had a flat tire. Self-driving cars must do the same.</p><p>On another idea, <strong>temperature sensors</strong> keep tabs on critical components like the engine, brakes, and battery, preventing overheating or failure. <strong>Oil level sensors</strong> and <strong>coolant sensors</strong> ensure that the engine runs smoothly, while <strong>fuel level sensors</strong> provide crucial data for range estimation. They&#x2019;re kinda chemical sensors. Even the <strong>wheel speed sensors</strong>, which are integral to anti-lock braking systems (ABS) and traction control, contribute to the decision-making processes of a self-driving car by providing real-time feedback on vehicle dynamics.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">Interesting Idea</strong></b>: When learning self-driving cars, it&apos;s mostly the &quot;autonomous&quot; related sensors that are taught. Yet, <b><strong style="white-space: pre-wrap;">regular automotive sensors</strong></b> remain vital, and while you don&apos;t have to learn them, some jobs like Control Engineer require this.</div></div><p>Okay, so before seeing some examples, let&#x2019;s do a brief recap of that part 2:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.39.55.jpg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="1852" height="1056" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-13-at-15.39.55.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-13-at-15.39.55.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-13-at-15.39.55.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.39.55.jpg 1852w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The second half of the mindmap: proprioceptive sensors</span></figcaption></figure><h2 id="example-how-to-slam-a-nuclear-facility">Example: How to SLAM a Nuclear Facility</h2><p><strong>A few months ago, I was chatting with a client from my 4D Perception course</strong>, who has his own 4D Perception startup &#x2014;&#xA0;and he told me about one of his past experiences working with SLAM for a client from the nuclear field. &quot;Wait... Nuclear &#x2622;&#xFE0F;?&quot; I asked like a Looney Tunes. &quot;Yes! Many SLAM applications are incredible when you have radioactive environments!&quot; he answered before sharing a few details about it.</p><p>I got so intrigued by the idea that I started searching online, and of course... I found something! RNENuclear robots has written a <a href="https://www.mdpi.com/2218-6581/10/2/78" rel="noreferrer">paper</a> and made this video about it (if there is no image &#x2014;&#xA0;the video still works):</p>
<!--kg-card-begin: html-->
<iframe width="560" height="315" loading="lazy" src="https://www.youtube.com/embed/Hj7xt7isOWc?autoplay=1" srcdoc="&lt;html&gt;&lt;head&gt;&lt;style&gt;*{padding:0;margin:0;overflow:hidden}html,body{height:100%}img,span{position:absolute;width:100%;top:0;bottom:0;margin:auto}span{height:1.5em;text-align:center;font:48px/1.5 sans-serif;color:white;text-shadow:0 0 0.5em black}&lt;/style&gt;&lt;/head&gt;&lt;body&gt;&lt;a href=&apos;https://www.youtube.com/embed/Hj7xt7isOWc?autoplay=1&amp;mute=1&amp;rel=0&apos;&gt;&lt;img src=&apos;https://i.ytimg.com/vi/Hj7xt7isOWc/maxresdefault.jpg&apos; alt=&apos;Video&apos;&gt;&lt;span&gt;&lt;img src=&apos;https://www.thinkautonomous.ai/blog/content/images/2022/10/playbtn.png&apos; style=&apos;width: 60px; height: 60px;&apos; /&gt;&lt;/span&gt;&lt;/a&gt;&lt;/body&gt;&lt;/html&gt;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<!--kg-card-end: html-->
<p>When you think about it, there are tons of places we human can&apos;t go, and where robots equipped with the right types of sensors can make wonders. In this example, the robot works with <strong>gamma dosimeters, these are radioactivity measurement sensors. </strong>A common one is the&#xA0;ThermoFisher RadEye unit G10&#x2014; which can cost several thousands.</p><figure class="kg-card kg-image-card"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.53.30--1-.jpg" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="1668" height="1096" srcset="https://www.thinkautonomous.ai/blog/content/images/size/w600/2025/01/Screenshot-2025-01-13-at-15.53.30--1-.jpg 600w, https://www.thinkautonomous.ai/blog/content/images/size/w1000/2025/01/Screenshot-2025-01-13-at-15.53.30--1-.jpg 1000w, https://www.thinkautonomous.ai/blog/content/images/size/w1600/2025/01/Screenshot-2025-01-13-at-15.53.30--1-.jpg 1600w, https://www.thinkautonomous.ai/blog/content/images/2025/01/Screenshot-2025-01-13-at-15.53.30--1-.jpg 1668w" sizes="(min-width: 720px) 720px"></figure><h2 id="example-2-sensor-fusion-of-multiple-types-of-sensors">Example #2: Sensor Fusion of Multiple types of Sensors</h2><p>I talked about SLAM here, but most robots don&apos;t localize using SLAM; but using localization algorithms. This is often a fusion of GPS and sensors like IMU &amp; Odometers. It&apos;s all a fusion of sensors! Often done using an Extended Kalman Filter. In my course ROBOTIC ARCHITECT, I have a complete lesson showing it, but for this example, let me just show you the gist:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.thinkautonomous.ai/blog/content/images/2025/01/unnamed.gif" class="kg-image" alt="The main types of sensors in Robotics &amp; Self-Driving Cars (and how much you should know about each)" loading="lazy" width="480" height="270"><figcaption><span style="white-space: pre-wrap;">EKF Fusion of GPS &amp; IMU</span></figcaption></figure><p>The Extended Kalman Filter takes:</p><ul><li>GPS</li><li>IMU</li><li>Odometer</li></ul><p>And outputs the localization in the Google Map.</p><p>Alright, we covered a lot. To finish this article, let me answer a critical question:</p><h2 id="how-much-should-you-know-about-these-sensors-to-become-a-roboticsself-driving-car-engineer">How much should you know about these sensors to become a robotics/self-driving car engineer?</h2><p><strong>At the bare minimum, you should have a good idea of what these sensors are</strong>, how much they&apos;re involved, and what they do (what we&apos;re seeing in this article). There are other ideas you could expand, such as active sensors &#x2014;&#xA0;which emit their own waves (LiDARs, RADARs, ...) and passive sensors, which are not really sending anything to the environment.</p><p><strong>For beginner positions, it&apos;s probably a good idea to learn ONE of these sensors well.</strong> For example, the camera&#xA0;&#x2014;&#xA0;or the LiDAR&#xA0;&#x2014;&#xA0;or the GPS. When you specialize into one of these, you are also learning the algorithms, and starting to tell the difference between each type, and thus you start building expertise.</p><p><strong>For intermediate positions, I recommend to understand 2 or more sensors</strong>. If you understand the LiDAR AND the Camera, you can start doing some fusion of these for object detection projects. If you understand the GPS AND the IMU, same idea&#xA0;&#x2014; you dive into Kalman Filters, and thus have some good value to add.</p><p><strong>For advanced positions, the more the better</strong>. Experts have a good idea of the differences between sensor types. We discussed active and passive sensors, but there&apos;s also digital sensors vs analog sensors (one outputs a discrete or digital data/binary for computers, and the other looks more like &quot;signal&quot;), or you could also dive into one sensor and master it really well.</p><h2 id="summary">Summary</h2><ul><li><strong>Sensors are categorized into two main types</strong>: exteroceptive sensors, which perceive the external environment, and proprioceptive sensors, which measure internal dynamics.</li><li><strong>Exteroceptive sensors are split into 3 categories:</strong> Perception, Localization, and Contact/Environmental sensors.</li><li><strong>Perception sensors are cameras, LiDARs, RADARs,</strong> thermal cameras, ultrasonics, infrareds, ... They work really well alone, but also fused together to provide an accurate understanding of the objects around.</li><li><strong>Localization sensors are sensors like GPS</strong>, RTK-GPS, or even workaround solutions like Ultrawide band.</li><li><strong>Environmental and contact sensors, such as audio sensors</strong>, humidity sensors, and gas sensors, help monitoring the surroundings of autonomous vehicles and robots. In robotics, we also have all the contact sensors &#x2014;&#xA0;that are used for that idea of physical contact.</li><li><strong>Proprioceptive sensors</strong>, such as position sensors and motion sensors, monitor how the car moves over time. They can be used together with external sensors like GPS.</li><li><strong>We also use all the automotive sensors</strong>, like tire pressure, oil, etc...</li></ul><p>Understanding and mastering different types of sensors is important, please refer to the part above to get the gist. There are a few types of sensors we didn&apos;t cover; because I wanted this article to focus on robotics &amp; self-driving cars, but these are the main ones.</p><h2 id="next-steps">Next Steps</h2><p>If you enjoyed this article on sensors &#x2014; you might love this one on the <a href="https://www.thinkautonomous.ai/blog/types-of-lidar/" rel="noreferrer">different types of LiDARs</a>. You may also get interested into this article on <a href="https://www.thinkautonomous.ai/blog/9-types-of-sensor-fusion-algorithms/" rel="noreferrer">Sensor Fusion</a>. </p><div class="kg-card kg-callout-card kg-callout-card-yellow"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><b><strong style="white-space: pre-wrap;">If you want to learn more about sensors, and especially autonomous tech sensors like LiDARs, RADARs, or 3D Cameras &#x2014;</strong></b>&#xA0;I&apos;m talking about all of this through my private daily emails. These are emails I send everyday to an audience of 10,000+ Engineers, and that help engineers learn advanced technical content, through stories, mindmaps, and frequent tips.<br><br><a href="https://www.thinkautonomous.ai/private-emails?ref=thinkautonomous.ai">Subscribe here and join 10,000+ Engineers!</a></div></div><p></p>]]></content:encoded></item></channel></rss>