Real-time monitoring: A life saver for content providers

Real-time monitoring: A life saver for content providers

Real-time intelligence ensures that the data center is running optimally.

It's much easier to prevent data center downtime than it is to recover from it. In addition to the time it takes to troubleshoot and figure out what went wrong, outages are costly for businesses. Every minute that a website or service is down is a minute that customers or potential customers cannot be engaged. This lost opportunity adds up, and can take a bite out of the bottom line. For organizations in competitive markets such as media and content providers, it's especially difficult to bounce back from downtime unscathed.

Prevention, and all that it entails, should therefore be viewed as the cornerstone of strong data center management. This starts with assessing the potential risks within a facility, and perpetually monitoring for even the subtlest whiff that something might be wrong.

Exactly how serious is downtime for content providers?

Think of the data center as the brain and the nervous system for any organization that provides Web streaming services, online shopping, social networking, news and other Internet-based content. If the brain shuts down, the application, service or website enters a vegetative state. Customers can't do anything but wait until the lights come back on, or go somewhere else. 

A recent example occurred a few years ago when an Amazon Web Services data center experienced several outages that took out Amazon.com, Vine and Instagram, among others. Of these content providers, it is believed that Amazon suffered the most. BuzzFeed reporter Mathew Lynley did some math, and came up with this number: $1,104. This is the approximation of the amount of money Amazon.com lost each second its data center was down. The outage lasted for a total of about 25 minutes. 

Sales plummet by the second during data center downtime.Sales plummet by the second during data center downtime.

In 2012, AWS experienced a similar incident that knocked out Netflix and Pinterest. On this occasion, it was reported that the outage resulted from a failure of the uninterruptible power supply (UPS). Shortly before that, a problem with the power distribution unit in one of Amazon's data centers took out Pinterest, Quora and Foursquare, among others. According to DatacenterDynamics contributor Yevgeniy Sverdlik, the entire facility managed to switch over to a backup generator. However, one of these generators overheated and shut down due to a faulty cooling fan. The data center was then switched to a secondary backup source, but a defective breaker foiled that attempt. Amazon had essentially run through all of its options.   

So what needs to be monitored, and how?

"Temperature, power and capacity need to be watched in real time."

If the world's leading cloud provider sometimes struggles to keep its data centers up and running, there's little doubt that smaller or company-owned facilities also experience issues. Regardless of the size or type of data center, power and environmental monitoring is critical to ensuring uptime. Thousands of data points and vital metrics including temperature, power and capacity need to be watched in real time. As pointed out in a white paper from Geist Global, this means that the data center infrastructure management (DCIM) solution has a refresh cycle of seconds. Every metric that may in some way be indicative of a potential problem is immediately brought to the attention of data center managers, who will receive instant alerts when, for example, certain limits are transgressed. 

This austere level of monitoring can help prevent a slew of issues that could cost content providers millions of dollars, whether they're related to a busted cooling component, a faulty UPS or both. Real-time power and environmental monitoring is the key to continuity in the data center, and for content providers that lose thousands of dollars by the second during downtime, this continuity is essential.