SIA Tech Brief: Upgrade Path to AI –Video Over IP, Sensor Fusion and Aggregation of AI Processors
Introduction
Many market researchers (Forbes, Gartner, Deloitte) predict the biggest growth in artificial intelligence (AI) will be in workforce augmentation, not automation. AI algorithms train with data sets to recognize a threat and run a series of repetitive tasks, check outcome effectiveness, then optimize the process and repeat. Many employers (including Amazon) are requiring managers to gain machine learning (ML) and deep learning skills to rework algorithms, data sets and processes; to recognize what’s missing; and then augment, rather than just improving repetitive task speed.
In addition to the more obvious alarm and object recognition tasks, the security industry can apply this through a focus on what’s missing and augment the security response, mitigation and design process so we’re working alongside tech instead of fearing replacement.
Augmentation use cases in different industries are the goals of the AI “upgrade.” Each use case represents more of a teaching opportunity than legacy automation performing a single task over and over without improvement. “Failing fast” so the user can get to work improving the process is as important as the end product or service itself. Neural networks may be used to diagnose hidden “steps” happening on a production line, rather than an unexplained shutdown. The worker augments the process until they learn what is missing or faulty, then begins integration of the solution.
Deep neural networks and convolutional neural networks (CNN) run on continuously evolving hardware; the most popular for our cases are edge AI devices like IP video cameras running AI and AI processor hardware, often itself referred to as an AI “box,” “server” or “edge computing system.”
The AI System on Chip – Multiple IoT Device Functions on a Single Assembly
There is a positive trend in the low-power edge AI processor market. Power consumption is a key factor for edge AI applications where the entire system is powered by a battery. An ultra-low power microcontroller with a dedicated CNN accelerator and camera support can be equipped with active, sleep and low-power modes, allowing complex face identification functions periodically, typical of entry screening at an outdoor event.
The modern “core” of the Internet of Things (IoT) device used for security, safety and sustainability is the system-on-chip (SoC), which may incorporate some or all the following: CPU, memory, graphics processing unit, I/O control for HDMI port(s), Ethernet, Power over Ethernet and power sourcing equipment, USB port(s), Wi-Fi and Bluetooth connectivity and sound/visual sensor fusion.
Future IoT devices must balance power consumption with processing optimized for CNNs. Excessive power consumption as the processor strains to handle streams of complex video, like vehicles on a multilane highway, may produce a “choppy” effect as packets of data are lost and entire video frames go unprocessed.
These AI vision processors are already used in a wide variety of human and computer vision applications, including video security devices, advanced driver assistance systems, electronic mirrors and robotics. For example, the Ring Doorbell Pro 2 delivers enhanced 1536p HD video with an expanded head-to-toe view, bird’s eye view with intruder motion history and dual-band Wi-Fi, while operating on the low-power, high-performance Ambarella CV25M SoC.
AI Processors
What if you’re working with IP cameras that you do not wish to upgrade and run “edge AI” algorithms? Streaming ultra-high definition (UHD) for a lengthy period of time also presents challenges. An IP camera has to initiate the video stream to the decoding application, often a video management system (VMS). Should the VMS be on an underpowered server, itself connected to a poorly performing network, or even under distributed denial-of-service (DDoS) cyberattack, the IP camera may not be powerful enough to maintain the stream while trying to manage all these other tasks. Adding to that, any AI algorithm running at the camera edge and the cost per channel grows with the expense of the more powerful camera required. The individual cameras have to work harder, which consequently generates more heat and power consumption.
Quickly gaining popularity is the AI processor unit (or edge AI computing system) that contains an AI accelerator capable of running neural networks – for example, pose (skeletal) detection, weapons detection, vehicle identification, face matching, instance detection, occupancy (with privacy) and multiple object behaviors before an event.
The Foxconn AI Processor with Hailo-8 M.2 AI Acceleration Module performs a continuous 26 tera-operations per second and is capable of processing 15 UHD streams from IP cameras at very low power. The AI processor unit can be placed between the IP cameras and a VMS, where an additional video stream is processed, delivering far more actionable real-time visual data in a quickly deployed upgrade. An outage to the fiber infrastructure linking IP cameras to the command center may have an aggregation point close to the command center itself, resulting in a wider outage. The use of an AI processor unit closer to a “natural” aggregation point of 10-15 existing IP video cameras permits the economical connection to another branch of fiber infrastructure serving a redundant command center. Infrastructure outages near either aggregation point would not result in service interruption as the AI processor unit is powerful enough to stream to multiple decoding locations. This also allows fast deployment of concert events at venues requiring temporary surveillance and a mobile command center. The design of these aggregation points, together with multiple command centers, contributed to the success of the Super Bowl LIVE Fan Fest serving 1.5 million people over a two-week period in Houston, Texas, in 2017.
Consolidating AI stream processing of a suite of visual sensors in a 15-to-1 ratio drastically reduces cost-per-channel to purchase and operate. A small city with 500 IP cameras and video analytics applications at an emergency operating center (EOC) can present an upgrade challenge. Locating approximately 40 AI processors closer to clusters of existing cameras delivers the benefits of multiple AI algorithms without increased network traffic. In addition, multiple output streams from the AI processor can serve mobile command centers with quick effect and same user experience as the EOC.
Smart cities with edge AI cameras or AI processor units can use ML to relieve urban areas of congestion and create a new model for road intersections that reduces energy consumption and improves the flow of traffic. To accomplish this, the deployment of 5G and vehicle-to-everything connectivity will allow moving low-latency ML algorithms to the edge of telecommunications networks.
Upgrade Use Case: Smart City Traffic
In a smart city, turn signal protected intervals need to be adjusted to activate after vehicles in the opposite direction come to a gradual stop, avoiding unstable traffic flow. In major vehicle accidents, variable message boards change the speed limit several exits prior to the incident, also preventing unstable traffic flow. The “aggregation” location of an AI processor with sensor inputs of different types (visible light, LiDAR, radar, thermal imaging) at multiple locations provides cost savings and improved resilience, as there may be multiple fiber infrastructure paths to both the EOC and the traffic control center.
Use Case: Retail
In a similar fashion, a cashierless store uses AI processors with low-cost sensors to identify customer buying habits while maintaining privacy. Since inventory is in the same location the customer picks from, there is no separate task required to see what has sold or to reconcile what was delivered from distribution versus sales. The distributor uses a robot (or human) to restock after hours, directly from a vehicle having an identical inventory aisle as the store. This can be achieved using existing IP video cameras, additional low-cost sensors and the AI processor.
The global AI investments in retail markets, through ML, natural language processing (NLP), and computer vision is estimated to grow from $1.7 million in 2021 to $36.5 million in 2030. Although the United States is the largest market, India is expected to experience the fastest growth. As with most successful AI developments, whether SoC, AI processor units, computer vision algorithms or NLP, the greatest opportunities are in collaboration.