We are entering a whole new era of processing power…
When people ask “What should I use as a GPU?”, most people answer “Nvidia”. Why? Because there are both CUDA and CUDNN available on it. CUDA and CUDNN are accelerators for your deep learning models that work only on Nvidia GPUs. If you have any other GPU than Nvidia, you belong to another time…
Until now. A few years ago, Intel released a new kind of chip that until now has been under the radar. A chip that is both very small and very powerful.
This chip is called a Vision Processing Unit, or VPU.
The primary goal of a VPU is to accelerate machine vision algorithms such as convolutional neural networks (CNNs) or even feature detectors and descriptors like SIFT, SURF, etc.
Now, let me tell you a story
On my first day working for MILLA , an autonomous shuttle company, I discovered a shuttle that can drive up to 30 km/h; quite an improvement if you compare it to our competitors at the time driving at 5–8 km/h.
At the time, the shuttle was new and there was no GPU yet on it.
In case you don’t know what a GPU is, here’s a quick picture that explains it well:
A GPU (Graphic Processing Unit) parallels the processes so operations are done faster.
In a self-driving car, this can be super useful for computer vision or point cloud processing. It was first released in video games because of the need to display multiple things at the same time. While GPUs are awesome for deep learning training, they’re still not great for deployment.
This is where VPUs come into play. They’re specifically designed to focus on vision algorithms and deployment. Deployment in machine learning is still a very tricky and complicated process. We don’t find open source code we can reuse and we have to get our hands dirty.
Running multiple algorithms at the same time
Once we received the GPU, I had to implement my computer vision algorithms. Obstacle detection, obstacle tracking, traffic light detection, lane line detection, drivable area segmentation, feature tracking, pedestrian behavioral prediction… There was a lot to do! Each task needed to be run on a neural network and was using GPU memory. It was clear that my GPU NVIDIA GTX 1070 couldn’t handle all these tasks at the same time.
After saturating the GPU with 3 or 4 key algorithms, I realized that I still needed to process the point clouds. Point clouds are the output of LiDARs ; there can be millions every second. GPUs work with RAM and can get completely full with only 1 or 2 neural networks running at the same time.
One more thing: We had 3 cameras and 3 LiDARs. Redundlancy was necessary and we needed different angles, lenses, and options. It meant that all of these had to be done 3 times. I was desperate for a second, third, or fourth GPU.
But space in a self-driving car trunk is limited. Heat is dangerous, and the fan is noisy for customers. We couldn’t allow the technical difficulties to destroy design and comfort. The reality is that running algorithms on 4 GPUs is sometimes hard and impractical when working on an embedded device.
So what prevented me from eventually jumping out of a window? VPUs. Vision processing units are emerging types of processors. The difference is that they are 100% dedicated to computer vision. Nothing else.
What does a VPU look like?
That’s it. A USB stick. This is Intel’s answer to Nvidia, and it’s really powerful. They also have bigger products (but not much bigger), all using the same chip inside: “Movidius Myriad X”.
Here’s the high-level architecture of this chip. As you can see, there’s a Neural Compute Engine specifically optimized for neural networks, vision accelerators, imagining accelerators, CPUs, and some more hardware to make it very powerful and efficient.
The Movidius is designed for convolutional neural networks and image processing operations.
Intel showed their chip doing pedestrian detection, age estimation, gender classification, face detection, body orientation, and mood estimation; all at over 120 FPS. These are 6 different neural networks running on a tiny device. It couldn’t even run on an Nvidia’s GPU without facing memory issues.
What are other great things about VPUs?
It’s all on the edge ; there is no interaction with the cloud. That means no latency and more privacy.
It comes with a toolkit and SDK called OpenVINO that can implement deep learning CNN libraries on the dedicated Neural Compute Engine in TensorFlow and Caffe.
When running your algorithms on this USB stick, you can completely free the rest of the computer and GPU for other programs such as point cloud processing.
You can stack multiple USB sticks and double the power as long as you want.
How does it work?
Although all details have not been made public, we can still take a practical look at the work under the hood. From a very long webinar from Intel on May 14, 2020, I managed to capture this slide:
- Decoding and encoding are done using the OpenVINO toolkit; it’s one line of code for each.
- Preprocessing is done using OpenCV or other libraries, and mostly just includes resizing and fitting the image to the requirements of your network.
- Inference is made using a special function of the toolkit that calls your models trained in TensorFlow and Caffe. It involves VPU, CPU, GPU, and FPGA.
We are reaching an exciting era. The Intel NCS 2 has been implemented in drones, small robots, and a lot of IoT applications so far. And people seem very impressed with it. It’s still very early, but I expect the market to grow a lot in the coming years.
Imagine the possibilities when neither memory nor processing power is a significant obstacle. Every deep learning application we love, such as medicine, robotics, or drones, now can see the limitations drastically diminish. We would simply lose our minds…
If you already tried using a VPU or Open VINO, I’d love to have your feedback!
Below, you will find helpful links for my Autonomous Tech community, website, and links to learn more about VPUs.
Need to learn more?