<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1078844192138377&amp;ev=PageView&amp;noscript=1">

Blog

5 Key Questions: Disaster Recovery as a Service

In previous posts, we highlighted 5 Key Questions to Ask Your Backup as a Service Provider and 5 Key Questions to Ask Your Replication as as Service Provider. In this final post of our BC/DR series, we’ll take a look at the final point on the Business Continuity/Disaster Recovery continuum: Disaster Recovery as a Service.

Defining Disaster Recovery

As a quick primer, Disaster Recovery (DR) is, at its core, an area of IT security planning. DR focuses on protecting an organization from the negative impacts associated with an unplanned event, such as natural disasters and equipment failures. The objective of DR is a disaster recovery plan (DRP) that can be followed in an emergency to restore applications and services. This plan is a collection of policies, procedures, and actions that are clearly understood throughout an organization, regularly tested, and quickly executed in the event of a disaster.

Defining Disaster Recovery as a Service

Disaster Recovery as a Service (DRaaS) takes DR one step further by offloading failover testing and event mitigation onto ServerCentral or another service provider. In a DRaaS environment, the replication of virtual or physical infrastructure takes place in our data centers. DRP policies, procedures, and actions are clearly defined. Our team conducts regular testing of failover scenarios, and in the event of an actual emergency, performs the failover, too. The entire DRaaS solution comes with a predefined Service Level Agreement (SLA) specifically written to achieve your operating objectives. 

For the past year, we’ve studied the questions our customers, prospects, and partners have asked about DR and DRaaS solutions. We compared our findings with that of leading industry analysts to compile this list of 5 key questions you should ask about any DRaaS solution:

1. In the event of a disaster, who is on hand to help?

We’re extremely biased here because we’re always onsite to help, but it’s an important point. In many cases, there isn’t anyone available to provide immediate support. A disaster event would signal a ticket that notifies techs to get to the data center to provide support. Find out whether or not there are actually people onsite who will help should you have an event declaration. If they’re not, be sure you know exactly what the latency/delay will be and whether or not it meets your SLAs.

2. Are there standard RPO/RTO targets we should adhere to?

We’re asked this question every time we talk about BC/DR, replication, and DRaaS, and the answer is no. That said, we do maintain a list of standard RPO/RTO windows that we see most often with customers or prospects. If you’re interested in discussing these, just let us know. The rule of thumb for RPO/RTO targets is whatever one fits your organization’s risk profile. The important thing to note is that you can look at these windows on an application-by-application basis. In many cases, simply stating the objective answers any targeting question.

3. Are hybrid (physical and virtual) environments supported?

Most environments are hybrid, meaning they consist of equal (critical) parts physical and virtual. Be sure that both parts of your hybrid deployment are supported in a recovery. These may be addressed via different technologies, but that won’t matter as someone else (your provider) is on the hook to be sure it works. Just be sure they’re both addressed within your required RPO/RTO windows, as it's possible that multiple solutions have multiple windows.

4. Do you run full disaster event tests? If so, how frequently?

In most cases, a DRaaS solution will include a predefined number of standard tests each year. Be sure this is clarified and that the frequency and depth of testing meets your compliance requirements. If it doesn’t, be sure you are aware of the costs associated with custom testing procedures. Ask for a sample test result so you can properly evaluate the data you will receive.

5. Are my security & compliance policies adhered to in event of a disaster? Are they adhered to during recovery?

Make sure to make your requirements clear when you ask this question. In most instances, security and compliance policies can be met. The key is that they're known. If you have requirements, make sure to state them up-front so that a potential provider can provide a userful answer. 


If you’re interested in discussing any of these questions in more detail or have new questions of your own, please don’t hesitate to contact us.

P.S. We have a host of tools and resources available to help you through the BC/DR planning process. Visit http://www.servercentral.com/services/draas to learn more.

Topics: Disaster Recovery

5 Odd Excuses for Data Center Outages

I Googled "craziest data center outages." The results did not disappoint.

1) Squirrels

These nut-cheeked, bushy-tailed rodents chew through cables like Willy Wonka's Violet chews through gum. In 2011, Level 3 Communications attributed 17% of fiber cuts to "squirrel chews."

Who, me? Who, me?

Image: Dan Leveille

 

2) Hunters

Bored hunters have been known to use fiber as target practice. A few years ago in Oregon, Google had to move its fiber underground because people kept shooting down their fiber insulators.

hunter shoots down fiber cables This hunter's on PETA's Nice List.

Image: Steve Maslowski

 

3) Snowmageddon et al.

Spoiler alert: storms can be a big threat to poorly-planned data centers. In 2012, 80-mph storm winds knocked out Amazon's power twice in one month, taking Pinterest, Instagram, and Netflix down with it.

"For the last time, Grandma, I don't need a coat!" "For the last time, Grandma, I don't need a coat!"

Image: Woodley Wonderworks

 

4) Poor anchor management

Undersea Internet cables are tough, but they’re not tougher than multi-ton anchors. In 2008, a ship's anchor caused service issues in Dubai when it sliced through cables on the seafloor.

Sinking the Internet. Sinking the Internet.

Image: US Navy

 

5) Curiosity

In 2000, a junior data center tech decided to see what would happen if he pressed a big red button. The emergency power shutdown cost the Charlotte data center millions in lost revenue.

Curiosity killed the data center. Curiosity killed the cat...and the data center.

Image: Ren and Stimpy

 

So, how do you prepare for an outage?

  1. Use ServerCentral's backup and recovery service.
  2. Eat a sandwich.
Topics: Disaster Recovery

2015 Technology Infrastructure Predictions

Each year begins with one of our favorite traditions: prediction season. These technology-trend forecasts are always insightful, helpful, and in some cases, humorous. Here are some of the predictions for 2015 that stood out:

Prediction: Adaptable Cloud Agendas

Analysts continue to highlight the fact that rapidly evolving cloud technologies have a real material impact on businesses. This is far more important than most people realize. Be prepared. It really is critical that companies have adaptable agendas for technology to help their business capitalize on key changes in core infrastructure. Something as simple as adopting high-performance cloud storage or deploying offsite backups of virtualized infrastructure could have a significant financial impact on their organization's performance.

 

Prediction: RESTful Interfaces

As developers increase their desire (and need) to work with services that communicate via RESTful interfaces, it’s worth investigating formal enterprise API management.

Forrester, a leading market research firm, suggests one way to stay flexible in the face of rapid enterprise application evolution is by adding RESTful interfaces for back-office applications.

The flexibility afforded by this approach will enable far more control over upgrade cycles than traditional back-office application developer releases.

 

Prediction: Security Protocol Enforcement

It’s easy to relax in this area. Don’t.

Most organizations already have security protocols in place. If you do, begin clearly communicating and enforcing them. If you don’t, you should begin working on them immediately.

“There are two types of companies: those who have been hacked, and those who don’t yet know they have been hacked."

John Chambers, Chief Executive Officer of Cisco

Be prepared for a security breach this year. It will most likely be the result of a common process or governance failure, not due to an application or infrastructure component. Be sure your IT security team is on top of management and perimeter-based processes, as well as current training so everyone knows exactly what to do when something happens.

 

Prediction: Software Containers

If you don't already know about software containers (often referenced as Docker), you will. The benefits of well-designed, containered applications are real. More and more companies are using Docker and other container technologies to improve the efficiency of app development, deployment, and management. 2014 saw a tremendous amount of momentum in this area, and it’s only going to accelerate in 2015.

 

Prediction: Hybrid Clouds

This one has been on the prediction lists for the past four years, and it probably is going to be there for the next few years, too. As enterprise application architectures evolve into services and virtualization deployments, it’s never been easier to use off-prem private and public clouds for compute and storage. Don’t expect to see full-fledged cloud transitions where an entire enterprise is moved to the cloud in one step. Instead, watch out for steady migrations on an application-by-application basis.

 

Prediction: Disaster Recovery

There are always lots of doomsday predictions. While we don’t subscribe to the doomsday theory, we do see, all too often, situations where infrastructure and applications were not architected to withstand a disaster event. Whether this is a loss of power, a system failure, or a natural event, the result is the same: critical applications and services are no longer available. There are many ways to address disaster recovery. In many instances, a well-designed backup strategy may be more than enough to withstand an outage. Every application has its own service expectations, so be sure to identify exactly how critical each application is and what type of plan is needed: backup, replication, or complete disaster recovery.

 

Prediction: Distributed Denial of Service (DDoS) Attack Mitigation

These days, all anyone needs to launch a DDoS attack are a credit card or prepaid debit card and a web browser. It really is that simple. As these “stress testing services” become more prevalent, the volume and sophistication of DDoS attacks will continue to increase, causing more damage than ever before.

It might also be worthwhile to ask your network and infrastructure partner(s) about the tools and processes they have in place to help you in the event of a DDoS attack. If they don’t give clear answers, you may want to consider protection via a third-party provider.

If you haven’t read 5 Key Questions for Selecting a DDoS Mitigation Service, now's your chance.

Remember: it’s not if you are attacked, it’s when.

Topics: Other Security Disaster Recovery Products and Services

From Backups to Replication to Disaster Recovery: What’s Right For You?

Since 2000, ServerCentral has evaluated countless technologies to identify the best possible solutions and business practices to help our customers and partners address all of the critical elements of their IT infrastructure.

Now more than ever, questions about backup, replication and disaster recovery are top of mind.

Whether it's supporting one company’s need to provide high-availability, always-on solutions or helping another company determine the right level of continuity for their business, we’re having these conversations on a daily basis.

In these conversations, we always present our view of backup and disaster recovery within a larger context—that of a business continuity continuum.

Single-Site Backup <> Multi-Site Backup <> Replication <> Disaster Recovery

Do we have the only view on BC/DR? No. We don’t expect to.

What we do have is a tremendous amount of experience developed from a group of customers with whom we work on a daily basis to develop the best solutions for their ever changing needs.

Here’s how we typically approach these conversations.

We begin with a two-part question: What are your RTO & RPO objectives?

When we speak about Recovery Time Objectives (RTO), we’re typically speaking about the time required for the resumption of core IT activities after an issue has occurred.

When we speak about Recovery Point Objectives (RPO), we’re typically speaking about data-driven applications, where the data and the application process must stay in sync (eCommerce, for instance). In most instances, RPOs are assigned to applications that have direct revenue impact.

We ask these as two separate questions because the application and process requirements significantly differ with RTO and RPO.

Which services are easy to prepare for and manage? Anything virtualized/clustered/software with support for failure scenarios such as email, for example.

Which services are difficult to prepare for and manage? Old/End-of-Life software, homegrown or legacy apps not coded for redundancy. Physical servers with older OSs which may have certain BIOS requirements, RAM requirements, with drivers for older hard drives interfaces (ATA or SCSI which may not be easily available, for example).

Single and Multi-Site Backups

When single-site or multi-site backup is presented as the critical requirement, it’s important to note how applications, data, and VMs are backed up now. This seems obvious—and it is—but it’s also very important. In many instances the backup processes for applications vary wildly by application and by organization. Within some multi-site organizations, the process will vary significantly by location.

Where you back up is as important as what you’re backing up.

Some companies backup locally, on-site. Others backup to tapes which are then stored offsite. While these are both strong starts, they fail to capitalize on managed solutions that are easily configured to deliver automated backups to a high-availability target at a high-availability data center. Supported by solutions from CommVault and Veeam, these types of solutions provide backups of data, applications and VMs while offering the ability to quickly restore with live data from high-availability facilities in the event of an emergency.

Typically, big companies are associated with big DR, while small companies are associated with smaller requirements like backup. In our experience, this isn’t true.

The RTO and RPO for your services will be unique to your business, dependent fully upon the revenue and operational implications of downtime. These values will dictate just how big your small backup requirements will be.

Replication

Once backup questions are answered, the next step is typically replication. Replication can mean many things: having multiple SANs in sync, keeping copies of images or file server backups available in multiple locations, and so on. Many companies address backup and replication in one step—but many don’t.

The critical aspect of replication is that it is happening in multiple high-availability data centers on high-availability/high-performance infrastructure.

As we continue talking about replication, we’re also talking about the need to accelerate the movement of data. Compression, deduplication, etc. all become critical pieces of the puzzle.

Deduplication

Deduplication specifically helps in three places:

  1. On the backup client which prevents sending the data to the backup server twice;
  2. On the storage platform, to optimize the use of disk space; and
  3. By optimizing caching on reads.

For example, if a company’s servers use a common OS or a base image, they only need one copy in cache to optimize the resource and expedite recovery.

Network

Equally critical to the underlying storage infrastructure is the network infrastructure. We touched on this above regarding de-duplication, but it is worth highlighting in a bit more detail as it is often overlooked as a critical success factor in BC/DR strategies.

As you can see here, there are very serious ramifications between a 1 Gbe and a 10 Gbe network connection when it comes to a restore operation.

Data Transfer Time (Hours)
NOTE: These are best-case scenarios for a local transfer. This means public, network-based transfers are going to see significant degradation in performance.


Something to think about when you’re discussing your BC/DR strategy.

Disaster Recovery

When we begin speaking about disaster recovery, in almost all cases, it is a much larger conversation. There are significant operational, compliance, and financial implications. There are some powerful solutions on the market from companies like Zerto that provide a foundation for near active-active configurations, but they depend heavily upon the virtualization configurations.

To make a very long story short, this is where the right people from your company and ServerCentral need to get to a whiteboard.

A worthwhile trick for disaster recovery:

If you know ServerCentral, you know we’re always happy to share what we learn. In this particular instance, we have a very nice trick many of our customers use to address the bandwidth issue:

Configure VM restores directly to a DR infrastructure.

With hypervisors ready to go at a DR site, simply boot the backup images and you're back in business. These can be dedicated or shared compute and storage resources depending upon the requirements.

Conclusion

It’s easy to get deep into the technology and architectures when discussing business continuity. However, it’s important to ground everything on the importance of each service to your business. What is the cost of having each service down? What are acceptable recovery timeframes for each service?

It is only when these baselines are set and recognized across your organization that the decision on how to back up, replicate, and/or restore your services can be fully made.

The good news? The technology today is available to make these DR processes much less painful than they were a few years ago.

Additionally, as more organizations write redundant, scalable software designed for cloud environments, and as more governments mandate DR testing for banks and other critical infrastructure, this technology is going to be pushed forward for everyone.

When it’s time for you to start your BC/DR conversation, no matter where you are on the continuum, we’ll be here.

If you’d like to learn more about our BC/DR solutions, please visit ServerCentral Backup and Replication or DRaaS.

Topics: Disaster Recovery

Treat The Cause, Not The Effect

You can’t fix what you don’t know is broken. Despite the business impact, few businesses actually measure how much downtime they rack up in a year.

Acknowledging a failure and researching the contributing elements becomes a critical success factor for risk mitigation and prevention.

When a web application is down, the impact can be immense. An e-commerce site that generates $100,000 per day in gross revenue loses $4,166.67 per hour of downtime. It may spend days catching up on shipping, refunds, and customer service calls after that hour of downtime, without accounting for losses from future orders. The fully loaded costs associated with downtime are likely far more significant than you think.

Often, the disruption to internal operations far outlasts the outage itself.

A company might decide to replace an overloaded network switch. But perhaps the root cause wasn’t a hardware issue, maybe it was the way its software handled transactions. They might pay for a new switch and schedule downtime to install it, when all they needed was a change in the software configuration. Performing a root-cause analysis reveals what needs to be fixed, eventually leading to great improvements in system reliability.

It's important to remember to treat the problem, not the symptoms.

Businesses that don’t actively plan for and manage failure rates never really know their level of risk exposure.

I welcome any opportunity to learn about your business and IT environment. If you'd like to discuss this further, you can contact me directly at avi@servercentral.com.

Topics: High Availability Disaster Recovery Products and Services

Patching Heartbleed

ServerCentral works around-the-clock to help keep our clients secure. Here’s some insight into the time taken to perform a massive, network-wide scan and notify our affected clients:

SC's response summary ServerCentral's response summary

You don’t spend 15 years in IT without one of “those" moments that stops you dead in your tracks. A few years ago, it was when a client crashed a live SAN because the storage vendor labeled it “DEV” instead of “PROD.” Last Tuesday, it was when a customer asked me to test his server for vulnerabilities and I was able to read somebody’s email (presumably one of his clients’) with the results of a paternity test.

Despite me doing it with permission to help diagnose a security vulnerability, there was still the residual sick feeling of actually being able to read someone’s private information so easily. Even though we patched his servers in twenty minutes, we knew there could be a thousand people reading personal correspondences like that one all over the world. More than anything, we felt powerless knowing we could have done nothing to correct this issue in advance.

My heart went out to him. His clients paid him to protect the confidential data they exchange with their customers, and the most trusted tool on the Internet for securing communications—SSL—had failed us all.

This customer thought his system patched against the Heartbleed vulnerability, tested it multiple times with the online tools available to him, and believed he was safe. When he discovered he wasn’t, he asked us to protect his systems. We immediately added filters to his firewalls to block the vulnerable responses, which helped as long as no one got past the firewall. Of course, he still needed to fix the underlying issues on his servers.

The last thing we wanted was to have another client learn that their server was open to the Internet, so we starting scanning.

On any given day, ServerCentral announces almost a million IP addresses to the Internet at large distributed between our POPs and data centers around the globe. Scanning that many IP addresses for a vulnerability like Heartbleed is no easy task, but our Managed Services and Network Engineering teams were more than up to it.

Our Response

At 3 PM Tuesday, our BGP sessions offered 986,113 IP addresses aggregated across our own IP space and that announced by our clients. Utilizing Nmap, an open source network scanning tool, we scanned every IP in our announcement lists for open port 443 web servers.

heartbleed patched by servercentral All better.

Using a high-end virtual server based in our own cloud environment, we completed the scan in approximately seven hours overnight Tuesday—just in time for our first shift to analyze the results.

While ACLs, firewalls, and intrusion detection systems could account for errors in accessing certain clients and IPs, we detected only 48,026 servers listening on port 443. That represented just shy of 5% of the total systems running an SSL web server. From there, we fed the discovered IP addresses into an exploit script that analyzed SSL handshake information for the Heartbleed vulnerability.

With the help of our Development team, we created a script that scanned up to a thousand hosts at a time for Heartbleed. Results came in slowly because of the nature of the scan, but by Wednesday evening, our team identified 7,553 vulnerable hosts in our network.

Our last step was to inform our customers, a task our Data Center Operations and Managed Services teams completed overnight Wednesday. We sent out a general announcement about Heartbleed with information on how to mitigate the threat to all of our customers. We then split up the list of vulnerable hosts and personally contacted each customer with specific hosts to patch.

Moving Forward

We faced one of the biggest threats to Internet security and made it through, but the fallout is just beginning.

The nature of Heartbleed still makes me uncomfortable to know that so much private, sensitive data could have been exposed to hackers or other nefarious parties. Still, I know that’s where ServerCentral makes the biggest difference.

Our people are here for you around the clock to stop every vulnerability dead in its tracks before you even detect an issue. We have the experience and technical skill to fix these problems and keep you and your clients safe.

Topics: Disaster Recovery Products and Services

Heartbleed Vulnerability Update

Heartbleed (CVE-2014-0160) has been top of mind, conversation and action for everyone of late. We want to provide you with a detailed update about our work to address this issue.

As of April 8th, 2014, all ServerCentral services have been patched against the Heartbleed vulnerability. It is safe and secure for users to interact with ServerCentral sites. Affected systems will require a password reset at your next login.

All managed service customers have been contacted about Heartbleed if there was a concern with the vulnerability on services we provide. Please note all of our platforms have updates available for you at this time.

In addition to securing our own services, ServerCentral has performed a basic scan of our customer base to detect anyone who may have vulnerable services. We have been reaching out to these customers to make them aware of this vulnerability and to offer our assistance on fixing the problem.

We strongly encourage the following steps for all customers:

  • Test all services which use SSL encryption, such as web services using HTTPS, SSL VPNs, load balancers, etc. for this vulnerability. Remember that hardware appliances can also be susceptible.
  • Once all services are patched, perform password rotations for anything which may have authenticated to OR through the affected systems.
  • Revoke and roll out new SSL certificates for services that may have been exposed.

We encourage all of our customers to perform additional reviews of their internal and external services and confirm they are secure against this vulnerability.

For more information about Heartbleed please visit http://www.heartbleed.com. If you would like to test your devices or sites, a good test for Heartbleed can be found at http://filippo.io/Heartbleed.

Again, if you have any questions or if we can be of assistance, please do not hesitate to contact us at your convenience.

Topics: Disaster Recovery

Why We Support The Business Resumption Planners Association

At ServerCentral, we love sharing our experience with our customers, our community, and anyone willing to listen. Whether we’re learning from technologists, business strategists, executives, operations managers, or continuity experts, we constantly seek the best knowledge to apply to the data center.

When it comes to IT infrastructure (our life’s work), we’ve just about seen it all. That’s why we feel that it’s the right thing for us to share:

We’re pleased to once again contribute to the outstanding BRPA community in 2014:

“Established in 1989, the Business Resumption Planners Association (BRPA) is an independent professional association of people employed in all aspects of disaster recovery, contingency and business continuity planning. Located in the greater Chicago area, we are a not-for-profit association dedicated to education and information exchange.”

As sponsors, we take this responsibility seriously. It’s critical that BRPA member organizations and those interested in DR have the ability to teach, and more importantly, learn, from one another.

With BRPA in our back pocket and decades of hands-on experience helping companies plan for and manage in-the-moment disasters (both natural and technological), we’re ready for anything.

Let us know if you would like to set up a DR-planning workshop.

Topics: Disaster Recovery