Artificial intelligence (AI) is useful when applied to a given practice area like the recognition of known cyber intrusion attack profiles, for a market such as utilities, in a given global geographic region, over an identified cybersecurity program’s useful lifespan. When the first manned landing on Mars occurs, should human “common sense” be trusted over AI-based scene recognition?
For public safety health entry screening, vehicle identification and weapons detection, AI is helpful when there are too many repetitive recognition processes or layers for a human to effectively perform. However, are there rules-based solutions that can do the job at the same (or higher) level of quality, cost or speed?
Common Sense vs. AI
A robot with a machine vision camera capable of counting the number of humans walking, running and posing through a public park and comparing them with the number of people sitting in this same area analyzes the scene for overcrowding and unsafe behavior. The robot has been training on data sets containing people in various poses. The same robot is placed in front of a lake, with a few people on rafts in the lake, along with some on kayaks, but the lake takes up its full field of view. Although the dry land part of the public area has many views of people in many different poses, moving in different trajectories and speed, they are still people.
People at leisure on a lake may be floating on rafts, in a rowing vessel or swimming where they are partly submerged. Deep learning methods may not accommodate open-ended learning, where a human would recognize a body of water and the need to count heads, torsos or even legs in the case of someone diving. In this case, creating a simple rule identifying whether the people or objects are in a body of water avoids errors due to an incomplete data set. Human “common sense” arrives at a decision faster than an AI algorithm. Based on the data set of whole people, the deep neural network (DNN) will work more effectively in the dry land park, even though there are many more people, in a wide variety of poses and exhibiting complex movement.
CNN for Object Recognition
In our public park use case where there is no lake, we apply our data set of people in different poses on the portion of the images containing detail. When processing images and video, a convolutional neural network (CNN) recognizes pixels having image data, takes advantage of adjacent pixels to recognize a single or multiple objects and downscales the images (known as pooling) for simpler recognition at scale. In general, a “basic” DNN would return similar analyses if all the pixels in an image were shuffled, where a CNN recognizes patches of pixels together as meaningful parts of an object, thereby reducing the number of processing layers.
Although CNNs have been primarily used for images and video content, sound patterns, they can also be used for classification tasks in natural language processing. A single shouted word, like “gun,” “stop,” “police” or “help,” can also be categorized. In the past, graphic processing units with high power consumption would accelerate video decoding from IP video cameras for public safety. Basically, you could only decode license plates or shipping containers at a guard station with power. This is changing as new low-power central processing units are paired with a CNN customized for a given accelerating instruction like high-throughput vehicle identification. These devices may be powered by batteries lasting a month or more, significantly lowering public safety technology costs and making it possible to have safer outdoor concert and event venues.