The Public Safety Data Lake

Making the right decisions regarding storage and other issues can vastly increase the value of video surveillance

Ken Mills, EMC

For years, surveillance has largely been deployed either as a closed system or on an appliance. The reasons are understandable: Most customers, companies and partners come from the CCTV days where every system was closed, and that is just how it worked.

Then IP video came along and turned the surveillance market upside down, bringing in new technologies, new vendors, new partners and new challenges. The surveillance industry has finally made the turn where IP is the standard rather than the exception.

But the dirty little secret of the sector is that, even though the technology is based on IP, the deployment still often looks like a closed system.

The cables are different, muxes have been replaced with switches, and 150GB DVRs have been replaced with 50TB NVRs, but the way the surveillance systems are deployed still results in closed systems. The data is locked inside the DVR or NVR. The video management applications are bound by the appliance with which they are packaged. And customers are often forced to upgrade hardware and software together. At the end of the day, almost all of the value that surveillance can bring to a company, an institution or a government is locked inside a black box.

There is potential, though, to unlock the value of the surveillance data and make it truly portable. Once that happens, it can be shared across applications, moved across on-premise and off-premise boundaries, and ultimately bring much more value to end users.

There is no better example of the need for data to be portable than the public safety market. Today’s public safety operators, offices and users require access to many types of data above and beyond surveillance data. There is a proliferation of body-worn cameras across the country and around the world, and data from these devices can add up fast, even in medium-size police departments. Storage requirements can average as much as 1TB per camera per year, so even with only 500 cameras, a police department would need 500TB of storage annually just for this application.

Body-worn cameras are now grabbing a lot of attention in the news, forcing many police agencies to make knee-jerk decisions on a solution. And while wearable cameras are an important part of the evidence story, they are not the only piece of the security equation. Police agencies also need to store and manage video from surveillance cameras, crime scene footage, digital evidence, interview rooms, mobile devices, unmanned aerial vehicles, and many other evidence inputs. Add in storage for license plate readers, GIS mapping, and in-car cameras, and the need for enterprise storage can be significant. And this does not take into account the addition of data coming from the explosion of the Internet of Things.

The need for data is not slowing down. Keeping this data locked into individual applications and appliances only slows innovation and increases costs. It is no longer acceptable to throw an appliance in a rack and call it a day.

The Public Safety Data Lake

Organizations are looking for an architecture that accommodates all of these new devices and the rapid growth of data so that they can finally realize the value that is currently locked in their closed systems. One emerging concept in the surveillance industry is the surveillance data lake.

As it applies to public safety, a surveillance data lake is made up of several key pools of data feeding in from different sources, such as in-car video, video cameras, body-worn cameras, and drones. From this pool of data, organizations can perform critical activities based on their needs, including analytics, evidence management, and anomaly detection.

This data needs to be secure, reliable and available to multiple user groups across key applications.

With an estimated 54 percent of their data going unanalyzed, federal organizations are missing many opportunities for applications and insights, including crowd counting, anomaly and incident detection, face matching, safety alerts, traffic monitoring, object recognition and suspicious behavior. Data is growing so fast that these organizations simply cannot find a way to scale, let alone analyze.

But while scale is clearly a challenge here, understanding what should serve as the foundation to the data lake architecture can be even trickier. Should users go on-premise or cloud? If on-premise, do they have distributed or centralized architectures? If they want to be in the cloud, how do they ensure easy access to the data? When they need to quickly access data for evidentiary support, will they need to dig through piles of storage to find it? How do they decide between private, public and hybrid versions of the cloud? What about a mixture of on-premise and cloud? Choosing a storage vendor alone, before even thinking of analytics, applications, etc., can be daunting and exhausting. And public safety organizations being in the public eye only heightens the pressure.

Forward-thinking public safety departments build a data platform that can collect, store and manage this data.

A data lake infrastructure provides a more cost effective storage environment with the ability to seamlessly integrate new types of devices while gaining more control over the data.

Finding a storage vendor that offers this type of open platform is critical to moving toward this enterprise model, which will prove more cost-effective, will be less complex to manage, and will allow for more innovation and the flexibility to add applications and gain value from surveillance data.

The Storage Layer

Storage is the foundation layer of the data lake architecture. The storage layer must support an open platform capable of managing disparate data sets from multiple devices while addressing the challenge of scale head-on. There are three major surveillance architectures out there today: distributed, centralized and cloud. Some companies have distributed-only environments. Some have only centralized environments. Some use both on-premise and cloud architectures for different purposes, while others go cloud-only. The following section explains the differences between these storage environments.

Distributed Architectures

Distributed architectures store video and surveillance data locally and then periodically transfer the digital data set to the central platform. An example of this might be a satellite police station that stores data in the office but, from time to time, transfers the data over to headquarters. Distributed architectures often integrate the data with applications and other systems, such as access control and intrusion detection, without engaging a central server. The resulting architecture reduces single points of failure and distributes processing requirements over many smaller sites.
Choosing the right storage vendor for distributed architectures can be made simple by answering the following questions: Is the vendor offering high bandwidth at a low cost per GB? Can the configurations be described as “plug and play,” that is, simple and straightforward to deploy? Does the vendor make virtualization easy for future growth?

Centralized Architectures

Scale is the primary consideration with centralized architectures. Centralized surveillance architectures – commonly used by police headquarters, schools, governments, airports and energy companies – host high device-count environments and are able to support large amounts of surveillance data. Storage must be made efficient in centralized architectures, and utilization rates must be high to prevent price creep. Since retention times and pixel/resolution quality are forever changing, migration time to apply these changes must be extremely low, if not non-existent.

Some companies use a converged centralized architecture when they need a total, extensible solution that consolidates systems. Components of a converged surveillance infrastructure may include servers, data storage devices, networking equipment, and video management/surveillance software for IT infrastructure management, automation and orchestration.

Converged and non-converged centralized architectures solve different storage challenges. Both are ideal options for public safety organizations that need to scale. Both also commonly exploit video monitoring and analytics to increase security and opportunity on the same platform, which is highly attractive to companies looking to simplify their business.

Cloud Architectures

The cloud has been causing some confusion lately. One example is in the case of body-worn cameras. Body-worn data in most states has very different storage requirements depending on the offense. Video of routine traffic stops may only be kept for 30-45 days, while DUIs may be kept for three-plus years, and federal crimes may need to be kept for the length of the imprisonment or, in some cases, forever. Most states have laws that require evidence used in a case to be kept a minimum of seven years. This means that, overall, video from body-worn cameras has a long shelf life, which results in big storage needs.

Organizations must consider these long-term storage and data management challenges and think beyond three-year or five-year buying cycles, or they could end up with an inflexible and costly solution. While the cloud is affordable at the start, it is important to understand the cost implications when storage exceeds 1PB, and organizations are paying monthly storage and access fees for 25-plus years.

Going the pure cloud route, therefore, is not always the best option for public safety organizations. Choosing a vendor that offers both cloud and on-premise storage options is a better bet as it will safeguard an organization’s assets and allow for future growth. Many companies opt to go on-premise first with the bulk of their “cold,” or long-term, storage, and then go to the cloud for deeper storage. This approach often is more cost effective, provides greater security, and simplifies application integration.

Some vendors offer cloud storage bundled with cameras, enabling customers to go cloud first and then go on-premise to save on long-term storage. This bundled option can be much easier for surveillance newbies as the process of purchasing is made simple. However, it may not be the best option for organizations with high retention requirements or that need to frequently move data from local to storage and back.

It is, therefore, important for organizations to weigh all of their storage options as there is no one-size-fits-all solution. Questions to ask in considering cloud providers include:

  • Who owns the data?
  • Will the data be subject to the Patriot Act?
  • What happens if the organization changes providers? Will the data be lost
  • What are the short-term and long-term costs?
  • What are the benefits of going cloud vs. on-premise?
  • Will the organization have easy accessibility and/or control of the data?
  • Are there long-term network costs?

Beyond Storage

Just a few years ago, the surveillance architecture conversation would have stopped at storage. Today, there are a number of ways to protect and gain greater value from surveillance data. Having architectural storage options is crucial to scaling any surveillance solution, but, with an open platform, organizations can maximize their storage investments by partnering with video management/surveillance software providers, securing their data, virtualizing their infrastructure, and integrating applications and analytics.


Virtualization can be considered the “enablement player” of the surveillance data lake. Instead of having 50 servers, organizations could have just two. When applied to the surveillance architecture, virtualization reduces physical complexity and points of failure while improving the overall system resiliency.

This becomes increasingly more important with the addition of devices such as body-worn cameras and drones. It is much easier to add new devices and account for the impending “big data blast” with a virtualized infrastructure than with a non-virtualized infrastructure. Virtualization creates an open platform and future-proofs investments by giving public safety organizations access to the applications they need at the time they need them.

  • Virtualization also helps to prevent vendor lock-in, which is critical to the idea of the data lake.
  • Organizations should consider virtualizing if the answer to any of the following questions is yes:
  • Are our servers eating up our overhead?
  • Will we be adding new types of devices in coming years?
  • Will retention time requirements increase?


When applying security to a data lake, it is important to consider authentication and auditing to ensure that the right people have access to the data. This can include checking who is viewing, downloading or printing certain data. Surveillance security solutions would, for instance, help a company catch an employee who entered his or her office building and inappropriately printed documents on a weekend. The costs of a data breach (in terms of both revenue and reputation) can be easily avoided with the addition of security to the surveillance data lake.

  • When considering a security vendor, organizations should ask:
  • How are we securing access to our data?
  • Who and what do we need to protect?
  • Does society have a spotlight on our organization?
  • Are we willing to bet on the cost implications of a data breach?


Surveillance applications are increasingly becoming valuable assets to businesses. Applications such as transport mechanisms can seamlessly transport data from disparate devices to distributed, centralized, and even cloud storage. This mechanism becomes a critical saver of resources when an incident occurs and the organization needs to quickly find a piece of data. The transport mechanism can simplify efforts by significantly decreasing time to value and freeing up resources.

Organizations that need to access their data quickly and/or move their data in and out of storage should consider adding applications to their architecture.


Another emerging trend is the increasing impact of enterprise analytics on surveillance data. Today’s analytics are mostly pixel-based, and they have proven to be less than 100 percent reliable. Once surveillance data is part of a data lake, organizations are able to analyze all of the data at one time using multiple analytic solutions. Software is available that can take very large surveillance files and organize them so that analytics can be applied across all the data at one time. The result is the ability to analyze large amounts of data quickly, or, at least, much faster than with pixel-based solutions. Once the data is organized, an organization can apply analytics applications and business intelligence tool sets to search for trends and anomalies, as well as integrate with other data across the enterprise.

This technology can also be applied to use cases like video indexing to enable content-based video search, traffic analysis based on trajectory analysis for optimizing city transportation, and more. Once the structured insights are extracted, adding other data can generate deeper insights from siloed data sources to, for example, help correlate retail stores’ surveillance video with transaction logs.

Putting It All Together

Navigating through the new world of surveillance is not an easy task, especially considering the industry dynamics and growth trends in play. Three good general steps to follow are:

  1. Commit to using an open platform via a surveillance data lake architecture.
  2. Do your storage homework. Consider on-premise (distributed and centralized) and cloud offerings. A storage vendor that offers both on-premise and cloud can provide the most bang for the buck. Make sure this vendor has strong VMS relationships and a healthy partner ecosystem. A good storage vendor will provide VMS sizing guidelines, reference architectures, and implementation guides to ensure success.
  3. Go beyond storage. There are many ways to realize value from data and future-proof investments. An open data lake platform will provide the flexibility needed to add security, virtualization, analytics and applications to storage solutions.

Regardless of the makeup of the architecture, an open platform, ownership of the data, better security, and the ability to integrate with other applications are all critical requirements. A surveillance data lake will help store and manage these different pools of data and protect an organization into the future.

Ken Mills ( is senior manager for global business development at EMC (