The Real Benefits of Artificial Intelligence

How ‘Computer vision’ powered by AI could radically change video surveillance


David Monk, Umbo CV

When people think of artificial intelligence (AI), they tend to think of either Skynet – a dark, malicious entity – or C-3PO – a bumbling though harmless droid. But both images misunderstand the concept of AI and what it means to the world. Simply put, AI is going to have a major impact, starting from the nascent but growing field of computer vision.

AI is going to have a major impact, starting from the nascent but growing field of computer vision.

AI’s Computing Origins

Scientists have been working on artificial intelligence since the 1950s when it was mostly based on the concept of “symbolic artificial intelligence.” This framework assumed that many aspects of intelligence could be achieved through the manipulation of symbols. Despite considerable success, symbolic artificial intelligence ultimately fell short, especially in the field of computer vision.

A new AI approach then emerged – statistical AI and convolutional neural networks, which use millions of data point to make a computer “program itself.” These networks are “trained,” meaning they are fed some data to process, then tweaks are made to the network and it is retested. This cycle continues until the resulting algorithm is highly performant.

Neural network-based technologies are already making their presence felt. In 2016, Google’s AlphaGo gained a lot of publicity by defeating one of the top Go players in the world in a five-game match. AlphaGo determines its moves by using a neural network. Facebook uses a neural network to identify and suggest tags for people in uploaded photographs. Its DeepFace network is trained on a dataset of 4 million facial images belonging to more than 4,000 people.

These new technologies have unlocked new uses and behaviors that would have been impossible or impractical for a programmer or team of programmers to create. Computer vision is at the forefront of these new uses.

Computer Vision

AI applications that use deep learning applied to images and video are referred to as “computer vision.” A recent TechCrunch article by Colin O’Donnel named computer vision’s video-as-a-sensor technology as the most important of the emergent technologies that are changing societies.

Video security without video analytics is only as effective as the people doing the monitoring.

A human operator has to maintain a high level of concentration, while also dividing that attention to monitor feeds on multiple screens. Even when operators are trained, research has found that the error rate is high. Humans are simply not that good at monitoring for rare events across multiple video streams.

Even when operators are trained, research has found that the error rate is high.

While one study done at a prison in 1972 found an 85-97 percent detection rate of conspicuous events such as running and climbing walls, the prison scenes did not show much movement in the video footage; the subjects were standing still. A follow-up study in 1973 with moving figures in the footage found the range of detection rates expanding to 35-100 percent.

Increasing the number of monitors that surveillance operators have to observe decreases their performance. In the 1973 study, the perfect rate was achieved only once and only when a single display was observed at a time. A study in Britain found that a detection rate of 85 percent with one screen dropped to 45 percent when nine screens had to be monitored. But, at the same time, having one display does not necessarily translate to consistently high performance. Another study in Britain in 2003 that looked for incidents of theft in an industrial setting – which is a much more complex scene than a prison – reported a detection rate of only 25 percent.

The State of the Technology

For enterprises, critical infrastructure, higher education, and government entities deploying more than 1,000 security cameras in a distributed environment, human monitoring misses security events, and traditional rules-based systems record many false positives. Both are significant liabilities. Large organizations can record millions of hours of video each month that needs to be monitored live and then reviewed forensically to find events of interest. Unfortunately, those poor results from the studies mentioned above have not seen any significant industry-wide improvement in the past few decades.

AI-driven computer vision technologies are entering the video security industry to help humans better perform security-related tasks. Accurate pixel-level human detection and object-of-interest detection in combination with configurable in-scene video region of interest allow for autonomous monitoring. When combined with capable video management systems (VMS), live notifications mean that security managers can be aware of situations as they happen. This same level of accuracy in computer vision can forensically find events of interest by scanning meta tags.

This enables thousands of hours of recorded video to be scanned within a matter of seconds, compared to the hours or days of manual review required by traditional human monitoring.

Does this mean that computer vision autonomous algorithms will replace humans? A better way to view this technological shift in video security is to say that AI is making today’s security personnel much more efficient, resulting in:

  • Reduced loss of life
  • Faster identification of critical behavioral events and the parties involved
  • More informed first responders
  • Reduced waste of resources resulting from false alarms

The promise of computer vision-enhanced video security is tempered by the complexity of the task, though. Traditional intelligent video systems (IVS) that purport to do this work are based on motion detection and external sensors mounted at key points along a perimeter. This means they will be triggered by nearly any kind of movement. The result is an immense amount of false alarms being activated, and personnel eventually just ignoring them.

Traditional IVS’s are unable to progress beyond simple motion detection for a good reason: The act of recognizing a person is subject to millions of edge cases that have to be accommodated by the algorithms.

A security camera has to deal with a lot of issues. It is sitting outside exposed to the elements, such as rain, wind, snow, sun and more. Each of these can wreak havoc on human counting or identification algorithms. Each camera keeps seeing a scene that its software regards as unique. Even something as simple as putting up Christmas lights can cause many artificial intelligence and computer vision algorithms to trigger false alerts every time the lights turn on or off. Before long, the sheer number of alerts would overwhelm users and reduce the effectiveness of the system.

There are countless variations on this issue. For example, what happens if the wind moves the camera’s perspective a little bit? Now everything has a different look, which could cause many false alarms. The algorithm needs to be retrained to acknowledge the new “normal” scene.

The key obstacles that IVS developers face are both technical and psychological. Computer vision algorithms must be flexible enough to deal with variable situations. And it is more than just having the most accurate tracking possible. There must be a middle ground. A super-accurate algorithm is pointless if it registers too many false positives.

The solution that industry leaders are taking is to create customized models trained on custom hardware with proprietary data captured from real-world situations. There are a plethora of open-source AI software products and online services available to developers, many of them backed by major tech companies.

Despite their prestigious backers, though, these off-the-shelf models still face the same issues as traditional IVS solutions. Only custom models specifically geared toward surveillance video footage created out of custom data – collected from real-world locations with real-world errors – will be able to move along the learning curve quickly enough.

Predictions for Computer Vision

As AI-powered deep learning techniques improve, behavioral analytics will identify suspicious activity and send specific notifications to surveillance operators in real time.

Human trespassing and behavioral notifications require the algorithm to identify a human out of a group of non-human objects. Additional techniques and data will be able to identify characteristics about a specific person or group of people.

Some problems already have potential solutions. For example, trials are underway that focus on detecting people who are wearing masks, helmets or any other headwear that obscures their face, something that human surveillance operators are trained to look for. Fight detection, meanwhile, is a specific set of clearly defined behaviors that is already addressable with today’s computer vision recognition algorithms.

Trials are underway that focus on detecting people who are wearing masks

In the far future, there are other behaviors that human surveillance operators watch for that will offer greater challenges for computer vision. Behaviors that have to take into account the time development factor are especially thorny. Algorithms today look at each frame of video individually, which is why they are suited for identifying human shapes. But behaviors that take time for an observer’s “intuition” to develop, such as robberies, accidents and theft (which is particularly difficult because it is a concealed act) offer different challenges as algorithms need to learn how to recognize significant changes in frames over periods of time.

The AI technological paradigm shift is imminent, as a list of recent milestones indicates:

  • The volume of “big data” being collected and made available is increasing exponentially, generating demand for rapid deep neural network processing chip technology advancement.
  • Enterprise-level cloud computing adoption is exceeding 70 percent.
  • The number of AI researchers is at an all-time high, marked by a 300 percent increase in Ph.D. and Ph.D. candidate papers published globally in just the past year.
  • More than 550 startups using AI as a core part of their products raised $5 billion in funding in 2016.

There are certainly challenges ahead, but the AI field is advancing at an exponential rate. What is possible today represents a huge leap forward in the field and offers great promise for AI-driven computer vision technologies that will contribute to the safety and security of people in years to come.

David Monk ( is eastern regional USA senior account manager for Umbo CV (