
This year, I am bringing back the infamous BLACK SEASONS, where you cutting-edge engineers will be able to build MASSIVE skills in autonomous tech. And in particular, this week, I'm launching my brand new VIDEO PERCEPTION COURSE, in which you'll learn to become a...
Mhh..Okay.
So what is a "Next-Gen" Computer Vision Engineer?
Well, the best way to explain is to show you with job offers mentioning "Computer Vision" and who are from companies really into building the future.

Take this first one from Facebook AI Research (FAIR).
Did you see how the Computer Vision Engineer job requires you to know how to work with semantics of data, including images, video, text, audio, speech, and other modalities?
Isn't it surprising that a Computer Vision Engineer is required to know how to process text and audio?
It makes sense, does it?
Now, take this other job offer at Apple:
"Want to ship amazing experiences in Apple products? Be part of the team in the Video Computer Vision (VCV) organization..."
Now, there is a Video Computer Vision Team at Apple!
But this isn't even what's surprising, what is surprising is that...
Just Computer Vision. Like the previous one.
And just like these 3 below:


Computer Vision at Senseye (industrial computer vision) requires deep expertise in computer vision, particularly with video or camera-based systems.

Computer Vision at DreamVu (Industrial 3D Vision) demands expertise in topics related to 3D and Optical Flow/Motion Tracking.

Computer Vision job at Waymo (robotaxis) mentions at line #1 'multimodal models' - models that can process text, videos, audio, lidar points, ...
It's no longer 2017. None of these jobs mention the ability to process images with OpenCV. This has become so much of a basic fundamental that the leading computer vision companies leave it out. Some basic applications like object detection are also left out, and I predict that in a near future, most computer vision engineers will be processing multiple types of data... video being the most important.
With this, you'll notice that the use of videos here is NOT exclusively reserved to "video" edge cases, like retail analysis, video sports analysis, people tracking, and so on... It used to be the case, in the 2017s - we used to have video use cases, but in the near future, videos will be used for even very common use cases.
This is partly why VIDEO PERCEPTION is not a general multimodal course, but rather a course about the heart of Computer Vision: Videos.
And when you pay close attention not only to job offers, but also to startups architectures, you'll notice they ALL switched to videos.
For example, here are 4 leading startups in the self-driving car industry, and recently published architectures:

Waymo, Wayve, Nvidia and Tesla show video first architectures
Today's algorithms are video first. If, as a computer vision engineer, you fail to understand videos, sequences, spatio-temporal fusion, and all at best stick to "frame-per-frame" tracking, you risk missing out on building the future.
So now that you understand that (1) video is the future of computer vision, and (2) most leading companies are using it, and (3) that video is far from being just a few edge cases like retail or sports analysis, but taking over the entire world...
Let me show you the course I am launching this Black Seasons:
Meet...
Build Next-Gen Computer Vision Skills

This course is made in 3 modules, let's take a look at what's inside each of these...
MODULE I

Let's take a break here:
An example of what you'll build is "Activity Detection", where you'll analyze when and where an activity happens in a video sequence.

So this is Module 1, after which you'll have very good understanding of motion, frame-per-frame processing, window processing, event detections, and more...
Next, let's see Module 2:
MODULE II

MODULE III







This course is not only advanced, but it's also not for everyone. So let me help you decide.
Sounds good? Okay, so this means that...
This is a self-study online course, which contains videos, articles, drawings, paper analysis, code, projects, and more...
The course is estimated between ~5-7 hours, depending on whether you just want to watch the content or do the projects as well.
Yes, mostly basic Computer Vision, including:
Our Think Autonomous 2.0 platform is optimized for collaboration, chat, support, answers, and community learning. In fact, some assignments will be done this way!






This course is unique because it's next-gen.
While most computer vision courses do a walkthrough of the fundamentals of Computer Vision, and dive into approaches that do not work today... this course focuses on the future.
As an engineer, if you are trying to build 'next-gen' skills, knowing exactly what is going on in the research field and inside self-driving car startups brings an element of attractivity most don't have.

Build Next-Gen Computer Vision Skills


© Copyright 2025 Think Autonomous™. All Rights Reserved.