<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1078844192138377&amp;ev=PageView&amp;noscript=1">

Blog

Is Low Redundancy Infrastructure Right for You?

Data Center Knowledge’s recent article, As Tenants Get Smarter About Data Center Availability, They Want More Options, discusses how companies can save money by reducing infrastructure redundancy. This concept has been a topic of conversation ever since.

After 15+ years of building infrastructure solutions, we’ve seen when, where, why, and how low redundancy infrastructure can work. Only a small percentage of companies can handle downtime. Consequently, low redundancy infrastructure increases the potential for downtime and carries with it significant risk. This doesn’t mean it can’t work for you, it just means there are a number of things to think about.

A few things to consider:

Application Criticality

You never want to skimp on infrastructure redundancy for anything critical that requires continuous uptime. While it’s logical that low redundancy is not for workloads that need 24/7 uptime, it’s difficult to overlook the often enticing lower costs of low redundancy.

For example, reducing redundancy is a major cost saving for large CDNs that aren’t affected by losing a POP. It also works for offline processing and development resources that can sustain downtime. If you’re unsure of the impact that an application’s downtime will have on your business, carefully research the answer to this question before making any decisions.

Cabinet Redundancy vs. Data Center Redundancy

Redundancy at the cabinet level is different than redundancy at the data center facility level.

We have customers that don’t want redundancy in some of their cabinets. If a PDU goes out, they can absorb the downtime.

Not having redundancy in data center power, cooling, and network, however, is an entirely different risk.

It’s extremely common for organizations that experienced issues associated with low redundancy, low cost data centers to actively seek a fully redundant facility.

The difference between cabinet and data center redundancy is an important distinction.

Saving Money Through Grouped Applications

Structuring your infrastructure by redundant vs. non-redundant requirements can save a significant amount of money, in addition to increasing the efficiency of ongoing expansion and administration.

Many applications can be configured in non-redundant fashions. Group these applications together in non-redundant cabinet or cloud configurations.

Other applications require full redundancy. Group these applications together and deploy them in fully redundant cloud or cabinet configurations.

Architecting Application Redundancy

Architecting redundancy at the application layer is complicated. There are a myriad of factors that require skilled application architects. If you’re targeting this objective, be sure you stack the deck in your favor by working with experts in this area.

One Final Takeaway

Low-redundancy infrastructure makes sense for a small percentage of organizations. These organizations typically have a very detailed understanding of each and every nuance of their application stacks and infrastructure footprint. When all of these variables are known, low redundancy infrastructure can be a very cost effective solution. For the majority of businesses, however, our experience has shown there are far safer places to gamble.

Topics: Redundancy

Preventing Data Loss with RAID

The concepts of scalability and redundancy go hand-in-hand. Building an environment that is capable of scaling out offers the ability to fine-tune how much failure you can withstand. There are a dizzying approaches to redundancy—power, network, storage, server, data, backup and replication, disaster recovery, load balancing, site redundancy—but for today we're going to hit the basics of one of the most fundamental: storage. More specifically, RAID—a Redundant Array of Independent Disks.

Why RAID?

RAID provides a lot of bang for the buck. For what is in most cases a small investment compared to other options, you can provide significant protection against one of the most common forms of failure.

If your power or network fails without redundancy, your site is down. Outages like this can be extremely expensive, but they're also typically fast to recover from. Entire servers and sites are important to consider, but require significant planning to address fully. Backup and replication are important, but again bring complexity and can have extended restoration times.

However, if you have a disk that fails without RAID, you've just lost data.

Replacing that drive won't bring the data back; you'll need backups (you do have tested up-to-date backups, right?), you'll need a plan to rebuild the OS and restore those backups, and you'll have extended downtime to rebuild. A non-redundant drive loss in turn means you likely have a server outage, as a server without is data or OS is a very expensive brick. RAID is one of the most cost-effective improvements you can make to an environment - for the cost of disks and an adapter (all typically a fraction of the cost of a server), you can protect against failure, add capacity, and improve performance.

RAID isn't a panacea - there's still a lot that can go wrong that it won't protect you from. Even just within the realm of storage, human error, application bugs, or filesystem corruption can still make your disk array useless (RAID is not a replacement for backups). Much like power conditioning, it won't replace the need for backup systems, but it can lessen your reliance on them and exposure in case of failure. There are also some applications that don't lend themselves to RAID by their nature - Hadoop, ZFS, and other self-managing systems typically need direct disk access and provide their own redundancy and scalability features.

Capacity, Redundancy, Performance

Single disks suffer from many problems:

  • They're low in capacity, maxing out at a "mere" few terabytes.
  • They're failure-prone, especially traditional spinning hard disks with their tight tolerances and moving parts, but even solid state drives fail over time.
  • Single disks can also only provide so much performance—hard drives have been a known bottleneck for many years, but SSD's can be constrained by their interface and internal design and still have upper limits that are easy to hit with modern applications.

All of these limitations are unacceptable for critical business infrastructure.

RAID allows us to address these concerns by spreading storage across a number of disks, harnessing their combined capacity, performance, and enabling redundancy. Multiple disks are teamed together, providing features greater than the sum of their parts, but with storage less than the sum of their GB to provide redundancy. Redundancy is provided via a number of algorithms and methods, but the ultimate goal is to write additional data that allows reconstructing any data lost due to a drive failure. Different layouts or RAID levels allow one to optimize storage for a specific purpose. There are several different RAID levels in use today, the most common being 0, 1, 5, 6, and 10.

Summary

In closing, RAID provides a lot of benefits for a relatively small investment. You're protected against the most common type of failure, and one that has the worst consequences—loss of data. Not only does it protect data, it can enhance the scalability and performance of your underlying storage. There are many different ways to deploy RAID, each with its own set of tradeoffs, which we'll examine next time. Stay tuned!

Topics: Redundancy Products and Services

What Is A Dedicated Server?

In a previous post, we answered “What is a cloud server?

Today, we’ll look at another common question:

What is a dedicated server?

A dedicated server is exactly that: a dedicated, physical server allocated specifically for your use. This is in contrast to a cloud server, where there will be multiple other people using the same compute resources within their own virtualized environment.

Dedicated servers offer you the ability to determine exactly how your compute resources are allocated. You can leverage any virtualization platform you desire (VMware, Xen, Hyper-V, OpenStack, etc.) and configure as many virtual machines as you like. You may also decide not to virtualize the compute resources at all. It is completely your call.

The computational benefits of a dedicated server really come down to your unique and specific allocation of your resources. No one else’s compute requirements will have an impact on your dedicated server.

Your infrastructure partner should be willing and able to help you architect your dedicated server configuration for critical and non-critical workloads, depending upon your unique requirements.

Like cloud servers, dedicated servers offer you the ability to save on purchasing and management costs that would otherwise go into developing and maintaining your own infrastructure. A dedicated server is paid for on a monthly basis, just like a cloud server, enabling the use of OpEx dollars v. CapEx dollars. The financial benefits are, essentially, equal.

Dedicated servers also support the ability to connect to cost-effective cloud storage. For many compute-intensive workloads, the use of shared or cloud storage may be sufficient. The uniqueness of the dedicated server remains with the control and management of the compute resources.

It’s worth repeating: Your infrastructure partner should be willing and able to help you architect your dedicated server configuration for critical and non-critical workloads, depending upon your unique requirements. This includes storage.

Finally, for the sake of consistency with our previous post, let’s talk about high availability and redundancy.

Dedicated servers are not, by definition or default, high availability or redundant.

Can dedicated infrastructure be architected for high availability and/or redundancy? Absolutely. At ServerCentral, we design these types of solutions on a daily basis.

The first question to ask, however, is whether or not your applications are architected for high availability and/or redundancy. If not, this is a great place to start a conversation with an infrastructure partner.

Topics: Redundancy High Availability Products and Services

Increase Uptime in Your Data Center Network

As networkers, we’re constantly thinking about redundancy and uptime. We’re taught that multiple links and devices means resiliency, which can be true, but complexity in the network can be equally complex to troubleshoot when things go wrong.

The classic Layer 1-2 redundancy model calls for multiple uplinks between hosts, switches, and routers. By making multiple paths available, you can create a fault-tolerant architecture. We usually need a protocol to protect against forwarding loops, which in Ethernet switching has traditionally been Spanning Tree (STP) and its variants. STP is one of those protocols that you can set and forget—until it’s time to troubleshoot.

So how do we as networkers continue to offer multiple links between devices and reduce the vulnerabilities of loop-free protocols like STP?

One option is to buy a single chassis-based Ethernet switch and load it up with redundant supervisors, line cards, and power supplies. This can be costly and often requires more cabinet space and power than is necessary.

Another option is switch stacking. A few years ago, switch vendors introduced stacking technology into certain switches. These switches have a stacking fabric that allows multiple independent devices to be configured as one virtual chassis. By connecting two standalone top-of-rack switches as one virtual chassis, we get the benefit of multiple line cards, redundant power, and a redundant control plane all in a smaller rack footprint than a bigger, chassis-based switch. What’s more, we get the added benefit of being able to fully utilize the extra capacity that we once stranded with STP and other active/standby techniques.

When host redundancy is a networking goal, the most common deployment that we see is switch stacking.

Stacking enables link aggregation between systems, which is very powerful in the network. Link aggregation gives you the ability to bond two or more physical links and present them as one logical aggregate link with the combined bandwidth of all members. On the server side of things, this can be called NIC teaming or bonding (bond mode 4 if you are a Linux user). On the routing and switching side, we specifically use Link Aggregation (IEEE 802.3ad), and often the LACP (Link Aggregation Control Protocol) option with it.

By deploying in a stacked environment with link aggregation, your server with 2x1G NICs can have 2 Gbps of active uplink capacity across two, physically diverse switches. Your switches can have that same redundancy for their uplink capacity. Your network can take full advantage of the extra capacity, and in the event of a failure, it automatically diverts traffic to remaining members.

Taking this a step further, we have server virtualization. It's a fantastic tool, and redundancy is an important aspect.

As we combine more virtual hosts onto less hardware, the impact of an individual outage grows.

What we typically do here is follow the idea of link aggregation across multiple switches. We use the hypervisor’s virtual switch to again create an 802.3ad-bonded link. We like the idea of network separation by function, so we use VLAN trunking over the bundle between the switch fabric and hypervisor. That way, we can move individual subnets and virtual machines (VMs) as needed.

There are different ways to bond links on server hardware.

I often prefer to configure LACP when using link aggregation because an individual link member won’t begin to forward until LACP has negotiated its state. If one of your switches has link but isn’t ready to forward, and your server detects link on the individual member, the server might forward to an unready switch member. LACP can prevent this.

Despite its benefits, LACP isn’t appropriate for every aggregated Ethernet scenario.

If your link is saturated or your CPU resources prevent the LACP control packets from being generated, your physical link might drop because one end didn't receive its PDUs within the configured interval.

Some hypervisors include LACP support, some make you pay for it, and some don't support it at all.

Remember that LACP is an option inside of the 802.3ad protocol (the terms are often incorrectly used synonymously), so even if your equipment can’t support LACP, link aggregation may still be a great option for you.

It's possible to mix switch types in order to create a custom stack to fit your particular needs.

For example, you might have two 10-Gbps switches and two 1-Gbps switches in a single 4-member stack:

Virtual Chassis with Link Aggregation Virtual Chassis with Link Aggregation

Here, your server has 20 Gbps of capacity for production traffic and 2 Gbps of capacity for back-end management connectivity. From a switch management standpoint, it’s one device with two logical links to configure.

Stackable switches with link aggregation provide a nice solution in the data center, where space is valuable and uptime is crucial.

Your specific configuration will depend on your hardware vendor, OS, and link capacity. Remember, we're always here to talk through your options (whether you're a managed network customer or not). If you need help, just ask!

Topics: Redundancy Networking Data Center