Listen Up

By James Marcella, Axis Communications on April 5, 2018 | SIA Technology Insights |

How audio monitoring raises the surveillance bar

Did you know that the “See Something, Say Something” campaign originated with the New York City Metropolitan Transportation Authority? The citizen participation program was so successful as a force multiplier for local law enforcement that the U.S. Department of Homeland Security licensed the idea and turned it into a nationwide campaign.

Video surveillance manufacturers are taking this lesson to heart by placing a new emphasis on the audio capabilities of their systems that enable their security tools to not only “see things” but “say something” as well. Coupling audio with video has been shown to increase a surveillance system’s overall ability to deter crime, increase public safety as well as improve situational awareness among responders.

Why Aren’t More People Using Their Audio Features?

Audio has been a standard feature in professional network cameras for at least a decade. Yet the integration of audio has had little impact in the overall scope of physical surveillance installations. Why? The challenges to adoption have been twofold: one legal and one technical.

For many users, the legal ramifications of adding audio to the surveillance system have pushed this technology out of consideration. As you read further, you’ll learn that there are several ways to approach audio that fall within legal bounds.
The complexity of integrating systems that are operating on different infrastructures can be costly. In many cases, audio equipment ranging from public address, phones and hand-held radios are all analog. As this article will explain, that siloed approach to systems is rapidly being replaced with standards-based, open solutions that run on the network.

Changing Your Perspective on Audio Surveillance

Most security professionals think of “listening” when you talk about implementing audio with surveillance cameras. But what they really should be focused on is monitoring. You should also take the “human” out of the equation and think more in terms of “audio analytics,” where the camera uses a decibel threshold or an acoustic signature to establish that an event has happened – such as glass breaking. Or, the camera uses a video analytic to trigger an audio response to an event – such as a broadcasting a verbal warning. These events could also be layered creating a tiered response based on a subject’s behavior.

To illustrate this point, imagine waiting for a train at the station. Notice the yellow line on the floor meant to protect people from coming too close to the tracks. Using a video analytic such as crossline detection, the video surveillance system could detect a person who is in the yellow space and then play a pre-recorded message over the networked speaker to “step away from the tracks.” If that person does not comply, an operator in the security operations center receives an alert, reviews the video live and then engages in a two-way conversation with the person. It’s one thing to hear a generic message, it’s another thing entirely to be identified by what you are wearing and asked to step back from the yellow line. If the person still does not comply, then the next tiered response might be to send a uniformed response to the location.

This defense-in-depth approach to physical security is prevalent in many facilities. It relies on a combination of people, processes and technology across multiple layers moving from the outer perimeter inward. Usually, these facilities mount cameras on the outside of a building to record activity along the perimeter of the property. But what they rarely do is mount loudspeakers on the outside of the building. And given that some security professionals argue that due to the prolific use of video surveillance, cameras are not as much of a deterrent as they used to be, these facilities are missing an opportunity to enhance their deterrence.

Case in point, loitering can be as innocuous as some kids choosing the wrong spot to “hang out” or it could be as threatening as someone casing your establishment or waiting to rob one of your customers. Either way, most business owners would rather not accept the risk and prefer that people move on. You can detect loitering with visual assessment of the scene, but you still need to communicate your intent, which typically involves approaching the offenders and asking them to leave. By adding loudspeakers to the equation, business owners and security professionals can address the offenders remotely from the safety of their building, thereby avoiding a possibly risky face-to-face confrontation. People are more likely to comply if they know someone is watching and recording their behavior.

Drilling Down on Legal Barriers

Most network cameras shipping today have embedded audio features that are disabled by default – for legal reasons. In the United States, federal and state wiretapping laws legislate when audio recordings are permitted. Since many end users and system integrators don’t understand the legal framework, they prefer to avoid audio altogether.

Depending on the state, you may be okay recording a conversation as many have enacted “one-party consent” laws with you being one of the parties providing consent. Other states require all parties to grant consent before a conversation can be recorded. Especially for companies operating across state lines, the simple solution has been not to record audio, period.

But automated audio monitoring is a different story because it doesn’t involve a human eavesdropping on a conversation. The audio analytics just detects sound waves by acoustic signature and makes no attempt to translate sound into words. (More about this later in the article.) So, there is no breach of privacy. In most cases, listening in and archiving an audio recording really isn’t necessary anyway. You just need to be made aware that something is happening so that you can respond. When the analytics detects certain sounds, it can automatically alert you to watch the live video – which is perfectly legal – so that you can ascertain whether it’s necessary or not to communicate with the target individuals to let them know you’re observing them. Usually, a simple verbal warning is sufficient to stop the behavior. In more serious cases, an audio component can enable you to inform the perpetrators that law enforcement is on the way.

A word of caution: If you do plan on recording audio, then I strongly suggest that you discuss it with a lawyer to ensure that you are operating your system within the letter of the law.

Drilling Down on System Complexity Barriers

Over two decades ago when the first network camera was introduced to the surveillance market, it was met with skepticism from the traditional CCTV integration channel. The technology was a paradigm shift from analog, and it required a whole new skill set – computer networking – which they did not understand.

That same shift is repeating on the audio side with the transition from traditional analog silos to a digital world. The good news is that the same hard-earned computer networking skills learned from the network camera world can be applied to the latest generation of audio equipment as well so the learning curve will be short.

Traditional physical security companies need to understand that the barriers to audio integration are quite low and adding audio to their portfolio of solutions can greatly enhance their value. Every job that requires video surveillance, particularly ones that involve the monitoring of public access spaces, should be evaluated as a potential candidate of audio augmentation. There are certain vertical markets where internal audio components are standard such as schools, hospitals and public buildings. These “overhead” paging systems are on the move to digital and could also be incorporated into the physical security discipline.

So, what should an integrator or security practitioner know before adding audio monitoring to their repertoire? Let’s start with the easy part: the network. Most of you reading this have been exposed to or have embraced network cameras over the past decade. The infrastructure you have deployed for these systems is the same for audio: Ethernet on an IP backbone. What you need is experience with Session Initiation Protocol (SIP), which is the predominant communications protocol used by manufacturers to ensure interoperability between devices. This you can acquire through internal training or hiring external expertise.

As an historical side note, SIP was introduced in 1996, the same year as the first network camera, and ratified as standard in 2000. Like other network protocols, it sets forth the rules for devices to establish, maintain and terminate connections. Most voice-over-IP (VOIP) phone systems use it, which makes these systems compatible with physical security devices.

For example, if a network camera supports SIP, you could pick up your VOIP phone, dial the IP address of the device and converse with the person you’re observing. This assumes that the camera is equipped with a microphone and speaker. Most cameras ship with an embedded microphone but rarely do they have a speaker unless they are purpose-built solutions such as a networked door intercom. So, in many cases, you have the basics on the networking side. You just need to add SIP to your already long list of acronyms that were thrust upon you by the IT world.

Understanding How To Optimize Microphones

To ensure that you use the camera’s microphone to its best advantage, you need to understand how its operation is affected by the environmental aspects of the scene you’re monitoring as well as what you’re trying to accomplish. Most built-in microphones are omnidirectional, meaning they pick up ambient noise in all directions around the camera. Depending on the size of the area monitored, this often does not meet the needs for security professionals unless the camera is placed indoors in a relatively small room.

There are also limitations with using built-in microphones outside because the camera needs to be installed in an enclosure that effectively cancels out the ability of the microphone to pick up any sound. In most cases, a separate, standalone microphone is used by connecting to the camera and co-locating it with the camera. By using a separate microphone, you can choose the best “audio pick-up pattern” for the given scene.

For instance, you might find that a directional or “shotgun” microphone is more appropriate for monitoring audio at your front gate because you can eliminate the sounds of your employees talking inside your perimeter. Manufacturers can provide you with the specification on the effective distances these microphones can pick up sound as well as best practices for installation. Most cameras will have a standard “line-in” jack located on the camera for using third-party microphones. Make sure you confirm the size of the connection before purchasing to guarantee compatibility. Also check for outdoor ratings for both temperature range and water/dust ingress. There are many IP66-rated solutions you can choose from on the market today.

Understanding How To Optimize Loudspeakers

Speakers come in many shapes and sizes, but most external speakers used today resemble wall-mounted bullhorns. Networked loudspeakers, which have been available for a few years, are now enabling some interesting opportunities for integration that were not possible with their analog cousins. By sharing the IP backbone, these network devices can be integrated with other physical security countermeasures providing the “say something” for more than just video. An example would be a Passive Infrared Detector (PIR) triggering a pre-recorded message that is played over the networked speaker. This form of hands-off approach can be leveraged across many video management systems and provides an added layer of deterrent.

Speakers also provide a valuable option for mass-notification across buildings and campus environments. As stated earlier, if your organization leverages VOIP for telephone communications, the networked speaker becomes one more “number” to dial giving security professionals the ability to address people across the entire system as necessary. Of course, that assumes the speakers are all in functioning order. But what if that is not the case? How do you find out which speakers are inoperable? Here, too, manufacturers have put some forethought into the problem. Today’s networked speakers have an incorporated testing function. Basically, it is conducted by a limited microphone embedded in the unit that picks up an acoustic pattern played by the speaker and reports back that it is operational. This saves valuable maintenance time and ensures that your investment is functioning when you need it.

Where Audio Analytics Comes Into Play

Most reading this article don’t have staff that monitors your video daily. So, your video is primarily a reactive tool that you view after the fact to determine what happened at a given scene. But with audio analytics, you can be more proactive with your security system, learning about potential problems in real-time without the legal constraints barring a live operator from eavesdropping on a conversation.

So how does it work? Embedded on the camera, audio analytics can detect such sounds as aggression, a car alarm, glass breaking or gunshots, and trigger a proactive notification should audio be detected that represents those acoustic signatures. Audio-enabled cameras equipped with audio analytics become intelligent, dual-technology sensing devices that security professionals can leverage with low incremental investment.

Municipal monitoring provides a great use case for several audio analytics when you consider the high number of people and vehicles in a relatively small area. Most aggressive acts, such as fights, start with a verbal altercation prior to the physical confrontation. Audio analytics can detect aggressive voices and alert law enforcement or private security that an altercation of some sort is happening in view of a particular camera.

Responders could then remotely monitor the situation to determine if this is a situation that requires intervention of some sort. It could be just some guys celebrating their team’s win or it could be aggressive posturing that might quickly escalate. Regardless, audio analytic detects aggressive sounds and quickly sends out the alert while the video verifies and records what is happening. Timely response is important so anything that can provide early indicators of behavior helps security professionals resolve issues or determine cause.

Not Just Video Anymore

Audio has been used for many years by security professionals but in limited situations and often as standalone systems that were not integrated with other security solutions. With the introduction of IP networked speakers, the use of audio has the potential to increase the value of surveillance systems at a relatively low incremental cost. Moving forward, it would behoove you to become acquainted with SIP. Furthermore, every time you install a camera you should ask yourself whether it is also an opportunity for installing audio.

James Marcella (jmarcell@axis.com) is director of industry associations at Axis Communications Inc.