<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1078844192138377&amp;ev=PageView&amp;noscript=1">

Blog

12 Keys to Successful Private Cloud Migrations

After you answer the “why private cloud?” question, it’s time to look at how you get there.

Cloud migrations are fraught with points of frustration that, according to a Gartner piece by Thomas Bittman, result in the failure of 95% of private cloud projects.

The root of this frustration, and ultimately project failure, is simplicity. When looking at any migration, the first and most important question to ask is:

“What are we looking at?”

When assessing a private cloud migration, it’s important to forget about the technology for a minute. The keys to your project's success lie in the operational details.

12 Steps to A Private Cloud Migration

Having managed and supported more cloud migrations than we can remember, we've distilled 12 critical success factors for these projects.

  1. Which applications will be moving to the cloud?
  2. Which servers are running these applications?
  3. What are the server’s CPU utilization rates?
  4. Which groups, departments, individuals use these applications?
  5. How critical are these applications to the organization?
  6. Who manages these applications?
  7. What is the impact to the organization if these applications are unavailable for an hour? For a day? For a week?
  8. Are these applications architected for high-availability and/or redundancy?
  9. Are there legal/compliance requirements for RTO and RPOs?
  10. What are the relationships between these applications?
  11. What are the security requirements for these applications?
  12. What are the data protection and retention requirements for these applications?

Are the answers to these questions the same for each application that is being migrated?

If not, what are the key differences?

When a project to "migrate everything to the cloud" needs to start yesterday, this is the perfect time to stop and make sure you have answers to these questions. In our experience, these questions often aren't asked, let alone answered. The result? Dissatisfaction, as there are no clear goals or an understanding of the migration implications.

All too often, organizations believe the cloud automatically makes applications highly-available and redundant, that a cloud-based infrastructure will be cheaper. Unfortunately, this isn’t always the case.

The most successful cloud migrations we’ve been a part of begin with detailed planning. In the largest of these migrations, we spent almost a year working with a customer auditing equipment, applications, access, relationships, business requirements and SLAs, talking to app users and owners to develop a clear and complete picture of the migration. Yes—almost a year.

Once this planning was complete, the actual migration time to cloud/virtual took approximately 6 months.

This detailed planning and preparation resulted in a migration with minimal issues and documented real savings (hard dollars) of millions of dollars per year.

Where did the savings come from?

A significant component of the savings was the result of the audit which showed that less than 25% of the compute capacity across the organization was in use. This meant a 1:1 relationship on compute capacity wasn’t required for the migration.

There's a lot of planning to be done to set yourself up for success.

Summary?

Once you’ve answered “Why private cloud?” and defined your end-goals (cost savings? agility? get out of the IT business?)—the more attention you pay to the details up front, the more likely your migration to cloud will be met with success.

Topics: High Availability Products and Services

What Is A Dedicated Server?

In a previous post, we answered “What is a cloud server?

Today, we’ll look at another common question:

What is a dedicated server?

A dedicated server is exactly that: a dedicated, physical server allocated specifically for your use. This is in contrast to a cloud server, where there will be multiple other people using the same compute resources within their own virtualized environment.

Dedicated servers offer you the ability to determine exactly how your compute resources are allocated. You can leverage any virtualization platform you desire (VMware, Xen, Hyper-V, OpenStack, etc.) and configure as many virtual machines as you like. You may also decide not to virtualize the compute resources at all. It is completely your call.

The computational benefits of a dedicated server really come down to your unique and specific allocation of your resources. No one else’s compute requirements will have an impact on your dedicated server.

Your infrastructure partner should be willing and able to help you architect your dedicated server configuration for critical and non-critical workloads, depending upon your unique requirements.

Like cloud servers, dedicated servers offer you the ability to save on purchasing and management costs that would otherwise go into developing and maintaining your own infrastructure. A dedicated server is paid for on a monthly basis, just like a cloud server, enabling the use of OpEx dollars v. CapEx dollars. The financial benefits are, essentially, equal.

Dedicated servers also support the ability to connect to cost-effective cloud storage. For many compute-intensive workloads, the use of shared or cloud storage may be sufficient. The uniqueness of the dedicated server remains with the control and management of the compute resources.

It’s worth repeating: Your infrastructure partner should be willing and able to help you architect your dedicated server configuration for critical and non-critical workloads, depending upon your unique requirements. This includes storage.

Finally, for the sake of consistency with our previous post, let’s talk about high availability and redundancy.

Dedicated servers are not, by definition or default, high availability or redundant.

Can dedicated infrastructure be architected for high availability and/or redundancy? Absolutely. At ServerCentral, we design these types of solutions on a daily basis.

The first question to ask, however, is whether or not your applications are architected for high availability and/or redundancy. If not, this is a great place to start a conversation with an infrastructure partner.

Topics: Redundancy High Availability Products and Services

High Availability Webinar Followup Questions

There were some great follow-up questions to my webinar on high availability infrastructure, two of which I want to write about today.

If I'm looking to add cloud capacity for resiliency and reliability, what should I look for in a provider?

Look for:

  • Their level of comfort with your technical stack and requirements
  • Level of experience with companies like yours
  • A consultative approach to understanding your applications, current infrastructure, operational model, and security requirements
  • The ability to implement your desired mix of public cloud and dedicated infrastructure
  • A high-availability network to connect your multi-site resources (if appropriate)
  • A track record of reliable operation
  • Willingness to share information about the process they use for design and change management at every level, from infrastructure and managed services to cloud product lines

You mentioned several technologies for adding redundancy and load balancing to web and app servers. What about database servers?

Most database systems require you to use their internal technology for load balancing, but Oracle, PostgreSQL, MySQL, and MS SQL Server all support most of the common high-availability implementations; usually, master/master, and read replicas.

MS SQL Server is most often implemented via shared storage, which is riskier if the underlying storage becomes corrupt. As another option for MS SQL, PCTI has a database load balancer that is transaction-aware and can keep multiple server or clusters in sync, locally or remotely.

To get more transactions per second than the classic database architectures can support, clustering is of limited use at a certain point, and it may be worth looking at Clustrix, MemSQL, NuoDB, and other "NewSQL" systems, many of which are transactionally safe (ACID compliant). Most of them require large amounts of RAM or SSD and direct, high-speed network connectivity, or have database size limits, but they can scale to much larger numbers of transactions per second for much less cost than traditional, ACID-compliant systems.

Topics: High Availability Events

Treat The Cause, Not The Effect

You can’t fix what you don’t know is broken. Despite the business impact, few businesses actually measure how much downtime they rack up in a year.

Acknowledging a failure and researching the contributing elements becomes a critical success factor for risk mitigation and prevention.

When a web application is down, the impact can be immense. An e-commerce site that generates $100,000 per day in gross revenue loses $4,166.67 per hour of downtime. It may spend days catching up on shipping, refunds, and customer service calls after that hour of downtime, without accounting for losses from future orders. The fully loaded costs associated with downtime are likely far more significant than you think.

Often, the disruption to internal operations far outlasts the outage itself.

A company might decide to replace an overloaded network switch. But perhaps the root cause wasn’t a hardware issue, maybe it was the way its software handled transactions. They might pay for a new switch and schedule downtime to install it, when all they needed was a change in the software configuration. Performing a root-cause analysis reveals what needs to be fixed, eventually leading to great improvements in system reliability.

It's important to remember to treat the problem, not the symptoms.

Businesses that don’t actively plan for and manage failure rates never really know their level of risk exposure.

I welcome any opportunity to learn about your business and IT environment. If you'd like to discuss this further, you can contact me directly at avi@servercentral.com.

Topics: High Availability Disaster Recovery Products and Services

Ars Technica Grows with High Availability Infrastructure

The infrastructure designed, deployed and maintained by ServerCentral has not caused a single minute of downtime for Ars Technica since day one—all the way back in 2008.

The Challenges

Growth
Ars Technica’s popularity continues to grow, so its website must function seamlessly with an extremely large amount of traffic. During highly touted events like Apple Liveblog, arstechnica.com typically grows from 2 million pageviews to 16 million pageviews within two hours. Being able to scale according to irregular fluctuations in demand is critical.

Potential Revenue Loss
Any downtime leads to significant revenue losses for Ars. Advertisers pay per impression, so Ars loses money every second that ads don’t display.

If we go down during something like Apple Liveblog, our users don’t come back. They’d be poached by the competition. Plus, it’s very bad for our reputation.

Lee Aylward, Lead Developer at Ars Technica

The Deployment

In 2012, Ars Technica set out to completely redesign its website, which began with a redesign of its IT infrastructure. Its number one priority was developing a more scalable, robust, and highly available architecture.

We knew we wanted to—needed to—do a high availability infrastructure, but had no idea how to approach it. ServerCentral helped us design the right solution.

Jason Marlin, Technical Director at Ars Technica

The Solution

ServerCentral and Ars determined that a completely managed, high availability architecture was the best approach:

Ars-Ref-Arch 

Instead of pushing us toward the most expensive equipment, ServerCentral’s team was tremendously helpful with arriving at a solution at the cost that we wanted. We hadn’t even considered managed network storage for our VMs until ServerCentral’s engineers suggested it.

Lee Aylward, Lead Developer at Ars Technica

Each component of Ars Technica's fully managed solution is configured for high availability. This ensures that Ars is up 24/7 regardless of equipment failure or maintenance.

Should any active component fail, the system automatically enables running-but-inactive spares and Ars continues to serve their readers and advertisers seamlessly.When a failed component has been replaced or brought online again, ServerCentral reincorporates it into the live environment and reestablishes the automatic failover capability. 

Results

Ars continues to increase visitors and pageviews by more than 20% each year. In addition, the infrastructure designed, deployed, and maintained by ServerCentral has not caused a single minute of downtime for Ars Technica since day one back in 2008.

Cost Reduction

Since having ServerCentral managing our infrastructure, we’ve without a doubt saved time and money. We have cost predictability because they own and manage everything, which our finance team really appreciates.

Jason Marlin, Technical Director at Ars Technica

Support

What I like about ServerCentral’s support is that it has a systematic way of handling things without the cold, robotic approach of your standard ticketing system. I send a request and I’m in their ticketing system, but they add an additional back-and-forth responsiveness anyway. It’s immediate, personal, intelligent, and it’s my favorite part about ServerCentral.

Jason Marlin, Technical Director at Ars Technica

Expertise

If we get a notification from ServerCentral that says ‘we’re looking into a network issue,’ rest assured it’s an actual network engineer working on it who knows a lot more about the problem than we do.

Lee Aylward, Lead Developer at Ars Technica

Convenience

Since we made the move to ServerCentral, we can focus on the programming of our site and user experience. We’re not stressed about infrastructure because it’s running so well.

Lee Aylward, Lead Developer at Ars Technica

Trust

When we get to the point where we have to plan another expansion, we’re doing it with ServerCentral.

Lee Aylward, Lead Developer at Ars Technica

Topics: High Availability

5 Strategies for Implementing High Availability

Even the most robustly architected systems fail from time to time. Fortunately, there are steps you can take to limit your downtime exposure. Here are five automated strategies to help you implement high availability:

1. Application-Level Routing

In the event of a transaction failure, cloud-aware applications can be engineered to intelligently route transactions to a secondary service point. A failed transaction query is automatically reprocessed at the secondary working location.

application level routing Application level routing

2. Network IP Management

Network IP Management allows a published service IP to move between machines at the time of a failure. This is classified as a self-healing process, where two servers monitor one another. If the first server malfunctions, the second server assumes its roles and processes. 

network IP management Network IP management

Some packages for Linux that provide this functionality are keepalived and the Linux-HA suite.

3. Monitoring

A well-integrated monitoring package not only provides insight into an application and its current function, it monitors error-rates that exceed a predefined threshold. For example, an e-commerce site can set up monitoring on a payment gateway so that if credit card authorization transactions exceed a 20% failure rate, their Network Operations Center (NOC) automatically gets an alert and self-healing tasks on the infrastructure initiate.

Some widely available monitoring packages are Nagios, Cacti, Zabbix, and Icinga.

4. Stateless Transactions

Engineering an application to perform transactions in a stateless manner significantly improves availability. In a stateless model, any machine only keeps state (data on) transactions that are 'in fly,' but after a transaction is completed, any machines that die or degrade have no effect on the state or memory of historic transactions. Clients are therefore not limited to server dependence, and the loss of any pool member in a tier ensures the client session is not interrupted due to a hardware or application fault on a discrete pool member.

Amazon.com leverages stateless transactions with a static key-value store to save shopping carts indefinitely.

The key is to avoid storing permanent state (i.e., transactions, inventory, user data) on individual logical or physical servers.

5. Multi-Site Configurations

In the (unlikely) event of a catastrophic hardware failure, resources can be redeployed to a secondary location in minutes and with little planning. Data replication and resource availability is present in the secondary location, and the just-in-time deployment of entire application infrastructures is measured in minutes, not hours or longer.

When architected and implemented properly, multi-site configurations allow a company to redeploy their entire infrastructure in a new data center.

An organization that cannot tolerate downtime in their application infrastructure will benefit the most from a multi-site configuration. In this situation, the additional site would be a completely independent data center that hosts an independent copy of the primary site infrastructure. Depending on how the site application is configured, the additional site can either be in an active-active configuration that services a portion of the traffic coming into the site, or a primary-failover site that will not serve traffic, but sits idle while continuously replicating data from the primary.

Read more strategies like these in my white paper, High Availability in Cloud And Dedicated Infrastructure.

Topics: High Availability Tips